humlab / penelope

Pipleline for generating data used in text analytics notebooks. Used by Welfare State Analytics, INIDUN and several other research projects.
5 stars 1 forks source link

count_threshold not used in _vectorize_sparv_csv_corpus. #18

Closed roger-mahler closed 3 years ago

roger-mahler commented 3 years ago

https://github.com/humlab/penelope/blob/6d2996b69e197d2f3897649a6c2da84deab040cc/penelope/workflows/_vectorize_sparv_csv_corpus.py#L55-L55

Suggested solution:

v_corpus = v_corpus.slice_by_n_count(count_threshold)
roger-mahler commented 3 years ago

Resolved by https://github.com/humlab/penelope/commit/7a92de713e5bd1864a1a114edf7150f3abce598e#diff-a048b96db5e8b81bfdbeba311124fa15df812098f8927ab154bb232412bc4d81