mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports
https://chatanalytics.app
GNU Affero General Public License v3.0
708 stars 51 forks source link

W #77

Closed HartmutLicht closed 1 year ago

HartmutLicht commented 1 year ago

Filterung mit Wordlist; Sorry, mein Englisch ist zu schlecht, so versuche ich es auf Deutsch: Ich möchte im Rahmen einer Masterarbeit den Inhalt von ausgesuchten Telegram-Chats nach bestimmten Worten durchsuchen. Ich habe eine relevante Wordlist zusammengestellt, die ich in einem Suchlauf über die Chats laufen lassen möchte. Aber anscheinend kann Chat Analytics dies nicht. Oder mache ich was falsch? Zweite Frage: Nach welchen Parametern ordnet Chat Analytics den Inhalt im Bereich "Sentiment" den Begriffen "positive" und "negative" bzw. "neutral" zu? Auch diese Funktion ist für meine Masterarbeit sehr wichtig, aber ich muss die Ergebnisse erklären können. Vielen Dank für eure Hilfe Hartmut

hopperelec commented 1 year ago

I don't know German so I'll respond in English.

No, Chat Analytics does not allow you to search the content of messages, in fact once the files have been parsed the original messages are lost and only key features of them are stored. The number of occurrences of significant words is kept, and you can search these in the 'Language' tab. If you're just wanting to find messages with these words, I would just use regex with each word in the word list connected by a pipe symbol.

I'm on mobile right now and, since you're wanting a detailed explanation of how semantic analysis works in Chat Analytics, I would want to do some research to make sure my response is accurate, so I'll do so once I'm next on my computer if I remember. I gave a brief explanation in response to https://github.com/mlomb/chat-analytics/issues/66

HartmutLicht commented 1 year ago

thank you for the first moment. My English is a little bit rusty, but I try again in English: For a master thesis in criminology I want to analyse the content of a couple of Telegram channels of conspiracy theorists. I want to exemine these channels for example, when was the greatest traffic but also what topics were discussed. For the last one I want to exemene these channels with speciell keywords to see, when a new topic dominates the discussion. So I gathered a list of about 100 keywords. And these words I want to search in these channels (I saved them on my computer). To the second topic: the function of acht analytics to sort the postings in Negative and positive would be also interesting for the master thesis. For example: On which days dominated the positive posts or the negative posts.? But I have to explain the result to my professor. That*s because it is important for me to understand how the program jugdes the messages ot positive and negative. For this investigation I need the program because I have to analyse about more than 100.000 postings. I can't do it by hand :-) I read your explanation under # 66, but I'm not sure that the explanation satisfys my prof.

hopperelec commented 1 year ago

Acknowledgments

Acknowledgments for sentiment data can be found in the README. If you want more detail on how the original sentiment scores were calculated, I'd refer to these sources. https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/README.md?plain=1#L83-L84

Raw text sentiment data (AFINN) is stored here and emoji sentiment data is stored (and pre-parsed) here.

Basic usage example

A basic example of how simple text sentiment is can be found in the test for sentiment https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/tests/process/nlp/Sentiment.test.ts#L16-L17

Code for calculating sentiment

The sentiment calculator is here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L13-L108 An instance of which is created here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/MessageProcessor.ts#L33 And each individual message's sentiment is calculated here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/MessageProcessor.ts#L116-L120

Code for visualizing sentiment

I'm guessing this is beyond the scope you needed, but you can also see how the calculated sentiment is visualized here

How the sentiment calculation works

I'm guessing this is the important part for you. Keep in mind that how the original sentiment scores are calculated is beyond the scope of Chat Analytics (there's a reason mlomb didn't collect this data themself!) and the actual calculator was written by marcellobarile, so your best bet would be to refer back to these sources if you need more detail.

A message starts with a score of 0. This score is inevitably what determines whether a message is 'positive', 'negative' or 'neutral'. We iterate through every token (punctuation, emoji, word) in the message and modify the score accordingly.

If the token is an emoji, we simply add the emoji's sentiment to the score https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L78-L81 If the token is a word, we first identify if it is a negator. AFINN has a list of negators, words such as 'not', and if a word comes within the 5 words (and within the same sentence/clause) following a negator, it has the opposite effect. Punctuation is simply used to prevent the effect of negators from rolling over to following sentences/clauses https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L73-L76 https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L84-L98

HartmutLicht commented 1 year ago

Thank you for your first answer. I answered you on GitHub in my bad English and tried a better Explanation for what I’m searching and wha. Perhaps you’ll be so kind to have a look on my post. Best regards

Von: hopperelec Gesendet: Samstag, 15. April 2023 20:39 An: mlomb/chat-analytics Cc: HartmutLicht; Author Betreff: Re: [mlomb/chat-analytics] W (Issue #77)

I don't know German so I'll respond in English. No, Chat Analytics does not allow you to search the content of messages, in fact once the files have been parsed the original messages are lost and only key features of them are stored. The number of occurrences of significant words is kept, and you can search these in the 'Language' tab. If you're just wanting to find messages with these words, I would just use regex with each word in the word list connected by a pipe symbol. I'm on mobile right now and, since you're wanting a detailed explanation of how semantic analysis works in Chat Analytics, I would want to do some research to make sure my response is accurate, so I'll do so once I'm next on my computer if I remember. I gave a brief explanation on a previous issue (I'll edit this comment later to link to the relevant issue) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

hopperelec commented 1 year ago

I'm confused, you responded to my original response from 5 days ago, but I gave a much more in-depth response yesterday which I think will be more useful to you. I'm guessing you got a GitHub notification and thought it was me repeating my original response via email

HartmutLicht commented 1 year ago

Good Morning, thanks a lot for the explanation. I’m not sure that I understood everything. But your explanation in the last Paragraph were very helpful. It should be enough for my professor. Thanks again!

Best regards

Gesendet von Mail für Windows

Von: hopperelec Gesendet: Mittwoch, 19. April 2023 22:17 An: mlomb/chat-analytics Cc: HartmutLicht; Author Betreff: Re: [mlomb/chat-analytics] W (Issue #77)

Acknowledgments Acknowledgments for sentiment data can be found in the README. If you want more detail on how the original sentiment scores were calculated, I'd refer to these sources. https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/README.md?plain=1#L83-L84 Raw text sentiment data (AFINN) is stored here and emoji sentiment data is stored (and pre-parsed) here. Basic usage example A basic example of how simple text sentiment is can be found in the test for sentiment https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/tests/process/nlp/Sentiment.test.ts#L16-L17 Code for calculating sentiment The sentiment calculator is here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L13-L108 An instance of which is created here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/MessageProcessor.ts#L33 And each individual message's sentiment is calculated here https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/MessageProcessor.ts#L116-L120 Code for visualizing sentiment I'm guessing this is beyond the scope you needed, but you can also see how the calculated sentiment is visualized here How the sentiment calculation works I'm guessing this is the important part for you, so I'll try my best to explain it in detail. Keep in mind that how the original sentiment scores are calculated is beyond the scope of Chat Analytics (there's a reason mlomb didn't collect this data themself!) and the actual calculator was written by marcellobarile, so your best bet would be to refer back to these sources. A message starts with a score of 0. This score is inevitably what determines whether a message is 'positive', 'negative' or 'neutral'. We iterate through every token (punctuation, emoji, word) in the message and modify the score accordingly. If the token is an emoji, we simply add the emoji's sentiment to the score https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L78-L81 If the token is a word, we first identify if it is a negator. AFINN has a list of negators, words such as 'not', and if a word comes within the 5 words (and within the same sentence/clause) following a negator, it has the opposite effect. Punctuation is simply used to prevent the effect of negators from rolling over to following sentences/clauses https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L73-L76 https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/nlp/Sentiment.ts#L84-L98 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>