x0rz / tweets_analyzer

Tweets metadata scraper & activity analyzer
GNU General Public License v3.0
2.94k stars 453 forks source link

top 10 most used words #16

Closed B-Weyl closed 6 years ago

B-Weyl commented 7 years ago

Get users top 10 most used words using nltk's stopwords to filter out common words

x0rz commented 7 years ago

Hi! Thanks for the PR 👍

I've tested your changes and here are a few thoughts:

[+] Top 10 most used words
- rt          286 (3%)
- #privacy     47 (0%)
- it's         38 (0%)
- like         34 (0%)
- ;)           30 (0%)
- don't        27 (0%)
- use          27 (0%)
- security     26 (0%)
- that's       26 (0%)
- using        24 (0%)
./tweets_analyzer.py:201: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if word not in stopwords.words('english'):

I would suggest:

I won't have much time next week (on vacations). Please feel free to update your PR with additional code 👍

-- x0rz

JusticeRage commented 7 years ago

Based on the results ("use" and "using" both present), a stemming algorithm would also need to be applied for the results to be meaningful.

x0rz commented 6 years ago

Conflicting PR at this time