quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Add min_freq and freq_type to textstat_collocations() #68

Open koheiw opened 9 months ago

koheiw commented 9 months ago

It is sometimes difficult to set min_count because we don't know how many collocations in the corpus. If the number is too low, we have to wait long time to get the result of computation.

How about adding min_freq and freq_type = c("count", "prop", "rank", "quantile") in a similar way to dfm_trim()? It is only to set min_count besed on the distribution in counts_seq.

https://github.com/quanteda/quanteda.textstats/blob/68a848903b08837b969c4928d427a4ae86bdde6d/src/collocations.cpp#L287-L290