[ ] Create and share wordlists; clarify should the file 10_clusters.json be used as 10 wordlists?
[ ] Allocate one feature per wordlist file and, for each feature, count how many words in the input text of a dataset item have a match in the respective word list. Optionally apply --clip_counts (maybe we should have separate options --clip_ngram_counts, --clip_wordlist_counts etc.).
10_clusters.json
be used as 10 wordlists?--clip_counts
(maybe we should have separate options--clip_ngram_counts
,--clip_wordlist_counts
etc.).