hltcoe / patapsco

Cross language information retrieval pipeline
Other
18 stars 7 forks source link

review language specific resources for tokenization and stemming #15

Open cash opened 3 years ago

cash commented 3 years ago

jieba and pymorphy should probably be their own classes rather than depending on stanza and spacy

dlawrie commented 2 years ago

krovetz stemmer - https://pypi.org/project/KrovetzStemmer/