Closed seb-29 closed 8 months ago
This is a spaCy feature, not a spacyr issue. But judging from the sources you link above, even the smallest model seems to obtain a .97 accuracy on the lemmatizer. See https://spacy.io/models/de#de_core_news_sm-accuracy.
I parsed a corpus in German using the model de_core_news_sm as follows:
spacy_parse(myquantedacorpus, pos=TRUE, tag=TRUE, lemma=TRUE)
spacyr (spaCy Version: 3.7.2) issued the following warning message:
Warning message:
In spacy_parse.character(myquantedacorpus, pos=TRUE, tag=TRUE, lemma=TRUE) : lemmatization may not work properly in model 'de_core_news_sm'
I got the same error message when using the supposedly better model de_core_news_lg. Hence my question(s): How should this warning be interpreted? Can I trust the results? Which lemmatizer is good (in R)?
To my knowledge, the best performing model should be de_dep_news_trf. However, in the past, I had some troubles with GPU stuff... (see issue 215).