jgrams / twitter_tool

Counts your most used words, and allows you to view categories of words.
http://secure-shore-16199.herokuapp.com/
0 stars 0 forks source link

Getting word counts #1

Open alexkahn opened 8 years ago

alexkahn commented 8 years ago

I would investigate the ability to filter some user's feed (at the API level) with English.

Next, I would filter the words for what's called 'Stop words.'

From there, you can create your term-count dictionary.

There are a few more things in there but I don't want to spoil the fun!

alexkahn commented 8 years ago

Also, you may get some inspiration from this: https://github.com/alexkahn/twitter-term-count

jgrams commented 8 years ago

Stop word functionality will be added at a later date, I think filtering out non-English words is currently outside the scope of the project.

Thanks!

alexkahn commented 8 years ago

Just to clarify, stop words are defined for the english language. https://en.wikipedia.org/wiki/Stop_words

Language filtering is something you would request in the API call.

jgrams commented 8 years ago

I'll definitely incorporate stop words when I have time. I also have to use a regex to sanitize my inputs (drop all leading (except @ and #) and trailing non-alphanumeric characters).

I don't want to limit the appeal of the app by failing to return a cloud if you're not tweeting in English. The functionality of the program should be very similar regardless of language. If I end up doing some kind of natural language analysis of the tweets, I would drop in a check for English.

jgrams commented 8 years ago

Stop words are incorporated, but there will be trial and error of adding and deleting individual words.

Still not sold on language filtering, I think the site works fine without it.

alexkahn commented 8 years ago

The language filtering was meant as a sort of 'When you use stopwords for Language A, make sure you're only processing language A.' When it comes to data processing, garbage in, garbage out. If you have a dataset of stopwords in various languages that would make a great addition to the site, especially if you do any language detection or rely on the contents of the tweet data structure.

jgrams commented 8 years ago

Makes sense! I want to add some other functionality like seeing which tweets are associated with which words first (AND TESTS so I can win the competition). I agree this is a good feature to add.