ncbi-nlp / NegBio

:newspaper: High-performance tool for negation and uncertainty detection in radiology reports
Other
156 stars 41 forks source link

Detecting negation for one CUI but failing to detect negation for other CUIs #25

Open kaushikacharya opened 5 years ago

kaushikacharya commented 5 years ago

Environment: Using MetaMap 2016v2 Sentence:

There is no spinal canal hematoma.

Among other CUIs, these are the ones I am focusing on:

<annotation id="2">
        <infon key="term">Hematoma</infon>
        <infon key="semtype">patf</infon>
        <infon key="CUI">C0018944</infon>
        <infon key="annotator">MetaMap</infon>
        <location length="8" offset="25"/>
        <text>hematoma</text>
      </annotation>
      <annotation id="3">
        <infon key="term">spinal hematoma</infon>
        <infon key="semtype">inpo</infon>
        <infon key="CUI">C0856150</infon>
        <infon key="annotator">MetaMap</infon>
        <location length="6" offset="12"/>
        <text>spinal</text>
      </annotation>

The term "hematoma" is negated by NegBio but fails to negate "spinal hematoma".

Here's the parse tree: <infon key="parse tree">(S1 (S (S (NP (EX There)) (VP (VBZ is) (NP (DT no) (JJ spinal) (JJ canal) (NN hematoma)))) (. .)))</infon>

There's amod dependency tag edge between "spinal" and "hematoma".

<relation id="R2">
          <infon key="dependency">amod</infon>
          <node refid="T3" role="dependant"/>
          <node refid="T5" role="governor"/>
        </relation>

where T3 represents the word "spinal" and T5 represents the word "hematoma".

How should we handle this issue? "no spinal canal hematoma" is identified as a noun phrase which begins with "no". Shouldn't both the term "hematoma" as well as "spinal hematoma" come up as negation?

xml dump of the collection just before executing negdetect.detect(document, neg_detector) i.e. after parse tree and dependency tree have been formed is shared here: http://collabedit.com/b2e33

yfpeng commented 5 years ago

negbio cannot handle this case right now because it should be "spinal canal hematoma" not just "spinal" to be recognized as C0856150. It is an error produced by MetaMap. An alternative way is creating a dictionary that contains "spinal canal hematoma" and then using the chexpert labeler to recognize it.

Please see https://negbio.readthedocs.io/en/latest/user_guide.html#named-entity-recognition

kaushikacharya commented 5 years ago

Hi @yfpeng I checked the output of MetaMap and found that the issue is in NegBio. There are four different ways of Positional Information as mentioned in Metamap documentation.

https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/dner_mm.py#L58

m = re.match(r'(\d+)/(\d+)', concept.pos_info)

Here we are only handling the 1st type i.e. the simplest form where the concept's text is a contiguous block of characters.

Here's the output of pyMetaMap for the example case in this issue:

ConceptMMI(index='1', mm='MMI', score='16.15', preferred_name='Spinal Canal', cui='C0037922', semtypes='[bsoj]', trigger='["Spinal Canal"-tx-1-"spinal canal"-noun-0]', location='TX', pos_info='13/12', tree_codes='A02.835.232.834.803')

ConceptMMI(index='1', mm='MMI', score='16.09', preferred_name='Pulp Canals', cui='C0086881', semtypes='[bsoj]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='A14.549.167.900.265')

ConceptMMI(index='1', mm='MMI', score='13.09', preferred_name='Hematoma', cui='C0018944', semtypes='[patf]', trigger='["HEMATOMA"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='C23.550.414.838')

ConceptMMI(index='1', mm='MMI', score='3.78', preferred_name='spinal hematoma', cui='C0856150', semtypes='[inpo]', trigger='["spinal hematoma"-tx-1-"spinal hematoma"-noun-1]', location='TX', pos_info='13/6,26/8', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.63', preferred_name='Hematoma Adverse Event', cui='C1962958', semtypes='[fndg]', trigger='["Hematoma"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Body Parts - Canal', cui='C1550227', semtypes='[bpoc]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Geographic canal', cui='C0442636', semtypes='[geoa]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')

The spinal hematoma concept [Positional Information: (13/6,26/8)] is of type (b) of positional information i.e. disjoint text strings. Currently in the NegBio code, re.match() is only returning the 1st match.