juliasilge / tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:
https://juliasilge.github.io/tidytext/
Other
1.18k stars 184 forks source link

Some stop_words are not neutral #205

Closed aliaamiri closed 2 years ago

aliaamiri commented 2 years ago

Some stop_words do not belong to the list. For example, there are some stop_words that are present in sentiment lexicons:

And there are other examples too. Is there any source to check onix lexicon similar to this one for SMART?

packageVersion("tidytext") [1] ‘0.3.2’

juliasilge commented 2 years ago

To start off with, it is not unexpected that there may be some overlap between a stopword list and a sentiment lexicon; these kinds of word lists are built in different ways and for different purposes. So that is correct, yep.

Specifically for the Onix Text Retrieval Toolkit stopword list, I am sad to see that these folks (who were Lextek) seem to not be around anymore and they have taken their documentation with them. 😕 I'll see what I can find with more research.

aliaamiri commented 2 years ago

Thank you so much for the clarification🙌.

I found Onix in this comprehensive list of stopwords lexicons. As you mentioned previously, these lists are very different in nature. Maybe, I should stick with a custom_made stopwords list.

juliasilge commented 2 years ago

Let me know if you have further questions!

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.