Open juh2 opened 9 years ago
The "non-words" raised by @juh2 should have been resolved in #49
>>> from nltk.corpus import stopwords
>>> deu_stops = stopwords.words('german')
>>> 'unse' in deu_stops
False
>>> 'unsem' in deu_stops
False
>>> 'unsen' in deu_stops
False
>>> 'unses' in deu_stops
False
>>> 'unsere' in deu_stops # valid stopwords.
True
But there are more stopwords missing for germans, to list a few:
>>> 'unserige' in deu_stops
False
>>> 'unserins' in deu_stops
False
>>> 'unseriner' in deu_stops
False
"unserins" und "unseriner" are not German words. Do you mean "unsereins" and "unsereiner"?
Please propose a definitive list of German stopwords and I will update our list.
nltk_data/packages/corpora/stopwords.zip contains four wrong german stopwords: