quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Vocabulary diversity threshold problem #53

Closed yaoysyao closed 2 years ago

yaoysyao commented 2 years ago

Hello, since my project uses the python language, so I need to use the python language to call the R language to calculate the diversity of vocabulary, but I have a problem, when I calculate the diversity score of the vocabulary, how do I judge that the vocabulary of the text is diverse, how to determine this threshold, in addition, in the quanteda.textstats project, whether there is a correlation function to directly determine whether it is complex, return TRUE or FALSE Thanks

kbenoit commented 2 years ago

That sounds like a StackOverflow question and beyond the scope of the Issues we normally raise here. But it would be easy to use an ifelse(...) call to return your complex logical value based on what you set as your threshold. (This will be subjective and something that only you can set.)