Closed B-Weyl closed 6 years ago
Hi! Thanks for the PR 👍
I've tested your changes and here are a few thoughts:
[+] Top 10 most used words
- rt 286 (3%)
- #privacy 47 (0%)
- it's 38 (0%)
- like 34 (0%)
- ;) 30 (0%)
- don't 27 (0%)
- use 27 (0%)
- security 26 (0%)
- that's 26 (0%)
- using 24 (0%)
./tweets_analyzer.py:201: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if word not in stopwords.words('english'):
I would suggest:
I won't have much time next week (on vacations). Please feel free to update your PR with additional code 👍
-- x0rz
Based on the results ("use" and "using" both present), a stemming algorithm would also need to be applied for the results to be meaningful.
Conflicting PR at this time
Get users top 10 most used words using nltk's stopwords to filter out common words