TODO (General List) - Githubissues

thomjur / PyCollocation

Python module to do simple collocation analysis of a corpus.

GNU General Public License v3.0

0 stars 1 forks source link

Open thomjur opened 2 years ago

thomjur commented 2 years ago

Maybe it would be good to collect several smaller TODOs here:

[ ] the nltk word_tokenizer also lists punctuation. I am not sure if we want that. For the moment, I have added a simple list comprehension to filter \w+ only... but we might need to think of better solutions here (or is we stick to this, we can also use NLTK's RegexTokenizer.
[ ] Add functions to directly work with twitter data from jsonl files (low priority)
[ ] implementing stop words #17

trutzig89182 commented 2 years ago

I have started to add some rough priority labels. If you don’t agree with them feel free to change them any time.