It is sometimes difficult to set min_count because we don't know how many collocations in the corpus. If the number is too low, we have to wait long time to get the result of computation.
How about adding min_freq and freq_type = c("count", "prop", "rank", "quantile") in a similar way to dfm_trim()? It is only to set min_count besed on the distribution in counts_seq.
It is sometimes difficult to set
min_count
because we don't know how many collocations in the corpus. If the number is too low, we have to wait long time to get the result of computation.How about adding
min_freq
andfreq_type = c("count", "prop", "rank", "quantile")
in a similar way todfm_trim()
? It is only to setmin_count
besed on the distribution incounts_seq
.https://github.com/quanteda/quanteda.textstats/blob/68a848903b08837b969c4928d427a4ae86bdde6d/src/collocations.cpp#L287-L290