janetzki / GUIDE

Create semantic domain dictionaries for low-resource languages
MIT License
4 stars 0 forks source link

Remove stop words and sentence tokens from GT SD data #7

Closed janetzki closed 1 year ago

janetzki commented 1 year ago

Goal

As a developer, I want to remove stop words and sentence tokens from the ground-truth semantic domain data to make the matches of words with semantic domains more meaningful. (I.e., we “sacrifice the minorityfor the majority.”) Motivation: improve alignment -> reduce false positives -> increase dictionary creation precision

Tasks

janetzki commented 1 year ago

Closed because completion on time is not realistic