mrbrianevans / social-media-export-analyser

Analyse GDPR exports of your data from big social media companies
https://social-media-export-analyser-mrybc.ondigitalocean.app/
MIT License
1 stars 0 forks source link

Frequency analysis logic in lib #52

Closed mrbrianevans closed 2 years ago

mrbrianevans commented 2 years ago

Analyse the frequency of tokens in data. This is to provide the user with insights into the data.

Examples are frequency of: words, user mentions, emojis, base URLs, hashtags etc.

This can be done using winkjs as a tokenizer and categorizer.

The logic should take as its input a collection of documents (string []) and return an object of frequency tables, something like this:

interface FrequencyTables {
    [tokenName: string]: { // tokenName can be something like "words", "hashtags" or "emojis"
        [tokenValue: string]: number // where number is the count of occurrances of tokenValue
    }
}

The logic could also take as input a limit of the top X results to include. Instead of returning the frequency of every single token, rather only return the 20 most used tokens or something like that, but the cut off should be a parameter rather than hard coded.