support wordlist features

kanishk-adapt / semeval-task10

Repo for SemEval Task #10 EDOS 2023. created and maintained for DCU - ADAPT submissions

Other

0 stars 0 forks source link

Closed jowagner closed 1 year ago

jowagner commented 1 year ago

[ ] Create and share wordlists; clarify should the file 10_clusters.json be used as 10 wordlists?
[ ] Allocate one feature per wordlist file and, for each feature, count how many words in the input text of a dataset item have a match in the respective word list. Optionally apply --clip_counts (maybe we should have separate options --clip_ngram_counts, --clip_wordlist_counts etc.).

jowagner commented 1 year ago

commit 4119340de71d7afe2cb9f975c82f446d749c04bb