ncbi-nlp / NegBio

:newspaper: High-performance tool for negation and uncertainty detection in radiology reports
Other
158 stars 42 forks source link

Run Stanford CoreNLP lemmatizer for jpype backend #13

Closed kaushikacharya closed 5 years ago

kaushikacharya commented 5 years ago

JPypeBackend.py of StanfordDependencies allows option to use Stanford CoreNLP lemmatizer using the input parameter: _addlemmas https://github.com/dmcc/PyStanfordDependencies/blob/master/StanfordDependencies/JPypeBackend.py#L86

But NegBio isn't utilizing this option for backend=jpype

I have compared NLTK wordnet vs CoreNLP lemmatization in terms of speed on few sentences. CoreNLP is much faster(almost 10 times). For this had made the following changes:

  1. passed add_lemmas=True
  2. populated ann.infons['lemma'] from dependency graph (https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/ptb2ud.py#L107)

How about making this change to utilize CoreNLP lemmatization?

yfpeng commented 5 years ago

I will check if the CoreNLP lemmatizer is consistent with other parts of NegBio, especially the patterns.

yfpeng commented 5 years ago

I will add it in the next version.