As a developer, I want to remove stop words and sentence tokens from the ground-truth semantic domain data to make the matches of words with semantic domains more meaningful.
(I.e., we “sacrifice the minorityfor the majority.”)
Motivation: improve alignment -> reduce false positives -> increase dictionary creation precision
Tasks
[ ] Find an example (very simple!)
"terre” is missing in the semantic domain “Planet” because it's spelled “la Terre” in the ground truth data. Stop word removal might help to get this missing link.
Goal
As a developer, I want to remove stop words and sentence tokens from the ground-truth semantic domain data to make the matches of words with semantic domains more meaningful. (I.e., we “sacrifice the minorityfor the majority.”) Motivation: improve alignment -> reduce false positives -> increase dictionary creation precision
Tasks