Open kaushikacharya opened 5 years ago
negbio cannot handle this case right now because it should be "spinal canal hematoma" not just "spinal" to be recognized as C0856150. It is an error produced by MetaMap. An alternative way is creating a dictionary that contains "spinal canal hematoma" and then using the chexpert labeler to recognize it.
Please see https://negbio.readthedocs.io/en/latest/user_guide.html#named-entity-recognition
Hi @yfpeng I checked the output of MetaMap and found that the issue is in NegBio. There are four different ways of Positional Information as mentioned in Metamap documentation.
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/dner_mm.py#L58
m = re.match(r'(\d+)/(\d+)', concept.pos_info)
Here we are only handling the 1st type i.e. the simplest form where the concept's text is a contiguous block of characters.
Here's the output of pyMetaMap for the example case in this issue:
ConceptMMI(index='1', mm='MMI', score='16.15', preferred_name='Spinal Canal', cui='C0037922', semtypes='[bsoj]', trigger='["Spinal Canal"-tx-1-"spinal canal"-noun-0]', location='TX', pos_info='13/12', tree_codes='A02.835.232.834.803')
ConceptMMI(index='1', mm='MMI', score='16.09', preferred_name='Pulp Canals', cui='C0086881', semtypes='[bsoj]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='A14.549.167.900.265')
ConceptMMI(index='1', mm='MMI', score='13.09', preferred_name='Hematoma', cui='C0018944', semtypes='[patf]', trigger='["HEMATOMA"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='C23.550.414.838')
ConceptMMI(index='1', mm='MMI', score='3.78', preferred_name='spinal hematoma', cui='C0856150', semtypes='[inpo]', trigger='["spinal hematoma"-tx-1-"spinal hematoma"-noun-1]', location='TX', pos_info='13/6,26/8', tree_codes='')
ConceptMMI(index='1', mm='MMI', score='3.63', preferred_name='Hematoma Adverse Event', cui='C1962958', semtypes='[fndg]', trigger='["Hematoma"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='')
ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Body Parts - Canal', cui='C1550227', semtypes='[bpoc]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')
ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Geographic canal', cui='C0442636', semtypes='[geoa]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')
The spinal hematoma concept [Positional Information: (13/6,26/8)] is of type (b) of positional information i.e. disjoint text strings. Currently in the NegBio code, re.match() is only returning the 1st match.
Environment: Using MetaMap 2016v2 Sentence:
Among other CUIs, these are the ones I am focusing on:
The term "hematoma" is negated by NegBio but fails to negate "spinal hematoma".
Here's the parse tree:
<infon key="parse tree">(S1 (S (S (NP (EX There)) (VP (VBZ is) (NP (DT no) (JJ spinal) (JJ canal) (NN hematoma)))) (. .)))</infon>
There's amod dependency tag edge between "spinal" and "hematoma".
where T3 represents the word "spinal" and T5 represents the word "hematoma".
How should we handle this issue? "no spinal canal hematoma" is identified as a noun phrase which begins with "no". Shouldn't both the term "hematoma" as well as "spinal hematoma" come up as negation?
xml dump of the collection just before executing negdetect.detect(document, neg_detector) i.e. after parse tree and dependency tree have been formed is shared here: http://collabedit.com/b2e33