Open alexkahn opened 8 years ago
Also, you may get some inspiration from this: https://github.com/alexkahn/twitter-term-count
Stop word functionality will be added at a later date, I think filtering out non-English words is currently outside the scope of the project.
Thanks!
Just to clarify, stop words are defined for the english language. https://en.wikipedia.org/wiki/Stop_words
Language filtering is something you would request in the API call.
I'll definitely incorporate stop words when I have time. I also have to use a regex to sanitize my inputs (drop all leading (except @ and #) and trailing non-alphanumeric characters).
I don't want to limit the appeal of the app by failing to return a cloud if you're not tweeting in English. The functionality of the program should be very similar regardless of language. If I end up doing some kind of natural language analysis of the tweets, I would drop in a check for English.
Stop words are incorporated, but there will be trial and error of adding and deleting individual words.
Still not sold on language filtering, I think the site works fine without it.
The language filtering was meant as a sort of 'When you use stopwords for Language A, make sure you're only processing language A.' When it comes to data processing, garbage in, garbage out. If you have a dataset of stopwords in various languages that would make a great addition to the site, especially if you do any language detection or rely on the contents of the tweet data structure.
Makes sense! I want to add some other functionality like seeing which tweets are associated with which words first (AND TESTS so I can win the competition). I agree this is a good feature to add.
I would investigate the ability to filter some user's feed (at the API level) with English.
Next, I would filter the words for what's called 'Stop words.'
From there, you can create your term-count dictionary.
There are a few more things in there but I don't want to spoil the fun!