mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports
https://chatanalytics.app
GNU Affero General Public License v3.0
711 stars 51 forks source link

Identify words which only/mostly appear together and group them together #35

Closed hopperelec closed 1 year ago

hopperelec commented 1 year ago

For example, I don't know what DefleMask is but I'd imagine most people using the words 'Defle' and 'Mask' are usually using them to say 'Defle Mask', so instead of being listed as two separate words they could be joined together. This would make 'Most used words' and 'Top reacted messages' easier to read, I think with minimal effect to the data size.

mlomb commented 1 year ago

I think you are talking about n-grams, which is in the TODO but takes a lot of effort to do it right. I would like to combine n-grams support with better compression for words list but it is not going to happen in the near future.