jenojp / negspacy

spaCy pipeline object for negating concepts in text
MIT License
274 stars 36 forks source link

adding on custom patterns in negspacy #19

Closed madhurkgp closed 4 years ago

madhurkgp commented 4 years ago

My code currently looks like -

import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()

negex = Negex(nlp, language = "en_clinical_sensitive")
nlp.add_pipe(negex, last=True)

doc = nlp(""" patient has no signs of shortness of breath. """)

for word in doc.ents:
    print(word, word._.negex)

The output is -

patient False
shortness True

I want the output to be -

patient False
shortness of breath True

How can I consider phrases like "shortness of breath", "sore throat", "respiratory distress" as a single entity.

I was thinking of adding this custom phrases to add in negation.py line 81. how can I do that? is there any other approach with which I can resolve this issue.

madhurkgp commented 4 years ago

it would be beneficial if we can add custom patterns in the phrasematcher,

jenojp commented 4 years ago

Well the issue isn't with the negation part of your pipeline here. In your example, "no" is the negation phrase. Adding additional negation phrases wouldn't solve your issue.

The issue is that the scispacy language model's named entity recognition is not "chunking" that phrase together as a single entity. You could manually add a pipeline component of "entity ruler" which lets you use rules/dictionaries to add additional named entities. See the docs here https://spacy.io/usage/rule-based-matching#entityruler

If you did want to add custom negation patterns for another use case, see https://github.com/jenojp/negspacy#use-own-patterns-or-view-patterns-in-use

jenojp commented 4 years ago

Closing issue