The TreeTagger misses a lot of lemmas and has the habit of then putting <unknown> in the lemma field. Replace those values with values from a WordNet or UMLS lookup (the latter for the medical genre). This is most naturally done just after text = self.tag_text(tokens) in PreProcessorWrapper.process() in components.preprocessor.wrapper.
It may be useful to have a look at the UMLS Lexical Variant Grammar.
The TreeTagger misses a lot of lemmas and has the habit of then putting <unknown> in the lemma field. Replace those values with values from a WordNet or UMLS lookup (the latter for the medical genre). This is most naturally done just after
text = self.tag_text(tokens)
in PreProcessorWrapper.process() in components.preprocessor.wrapper.It may be useful to have a look at the UMLS Lexical Variant Grammar.