stanfordmlgroup / chexpert-labeler

CheXpert NLP tool to extract observations from radiology reports.
MIT License
349 stars 79 forks source link

negative construction in clinical report #15

Closed farrell236 closed 4 years ago

farrell236 commented 4 years ago

Thanks for open sourcing the CXR labelling tool! I tried using the labelling tool on other CXR clinical reports and have got a very bizarre result:

From here: https://openi.nlm.nih.gov/detailedresult?img=CXR2016_IM-0665-1001

The text to be parsed is: "The lungs are clear without evidence of focal airspace disease. There is no evidence of pneumothorax or large pleural effusion. The cardiac and mediastinal contours are within normal limits."

The output of the NLP labler is: The lungs are clear without evidence of focal airspace disease. There is no evidence of pneumothorax or large pleural effusion. The cardiac and mediastinal contours are within normal limits.,,1.0,,,1.0,,,,,1.0,1.0,,,

which has marked positive for the following; Enlarged Cardiomediastinum, Lung Opacity, Pneumothorax, Pleural Effusion.

I'm not from an NLP background, so I'm not very sure what is causing these classes to be positively flagged when it should be negative?

Thanks!

jirvin16 commented 4 years ago

I tried this example and got the following output:

1.0,0.0,,,0.0,,,,,0.0,0.0,,,,

My guess is that the negation rules are not matching, so there may be an issue with the versions of your packages. I'd make sure the versions match those specified in https://github.com/stanfordmlgroup/chexpert-labeler/blob/master/environment.yml .

If you need to dive deeper, the negation rules are matched here:

https://github.com/stanfordmlgroup/chexpert-labeler/blob/master/stages/classify.py#L53

farrell236 commented 4 years ago

Many thanks for the fast response @jirvin16! that indeed have fixed the issue.

farrell236 commented 4 years ago

Hi Jeremy, I have encountered another few oddities. The following reports produces no annotations at all, not even "No Finding":

$ cat sample.csv 
"Both lungs remain clear and expanded. Heart and pulmonary XXXX are normal. No change in the large hiatus hernia."
"Hyperlucent hyperinflated lungs with flattened diaphragms. Granulomas. Small sized heart. Minimal apical capping slightly greater at the left. XXXX unremarkable."
"Normal heart. Clear lungs. Trachea midline. Scoliosis of lower thoracic spine. Degenerative changes of thoracic spine."

$ python label.py --reports_path sample.csv --output_path sample_labelled.csv

$ cat sample_labelled.csv 
Reports,No Finding,Enlarged Cardiomediastinum,Cardiomegaly,Lung Lesion,Lung Opacity,Edema,Consolidation,Pneumonia,Atelectasis,Pneumothorax,Pleural Effusion,Pleural Other,Fracture,Support Devices
Both lungs remain clear and expanded. Heart and pulmonary XXXX are normal. No change in the large hiatus hernia.,,,,,,,,,,,,,,
Hyperlucent hyperinflated lungs with flattened diaphragms. Granulomas. Small sized heart. Minimal apical capping slightly greater at the left. XXXX unremarkable.,,,,,,,,,,,,,,
Normal heart. Clear lungs. Trachea midline. Scoliosis of lower thoracic spine. Degenerative changes of thoracic spine.,,,,,,,,,,,,,,

are these some corner cases?

TheRedMoon commented 4 years ago

@farrell236 Which package did you have to change for it to work? I've got exactly the same problem with it only giving 1's and no uncertainties or negatives. Is it a negbio thing?

jirvin16 commented 4 years ago

^Have you been able to resolve this @farrell236? If not, I can take a look.

TheRedMoon commented 4 years ago

It was jpype that wasn't installed. All good now, thanks for the help!

farrell236 commented 4 years ago

@jirvin16, regarding issue in the reopened comment, I wasn't able to solve it. As it was only 3 samples out of ~1K, I just decided to omit it and assume they were corner cases. The initial issue was resolved by fixing package versions from the environment.yml.

jirvin16 commented 4 years ago

Sorry, I actually think the output of the labeler is expected on those cases. No Finding is intended to capture the absence of all findings (except support devices), not just the ones in the 12 categories. See https://github.com/stanfordmlgroup/chexpert-labeler/blob/master/phrases/mention/no_finding.txt for a list of the findings it looks for.

farrell236 commented 4 years ago

Thanks for clarifying this @jirvin16, I guess this issue can be closed.

luantunez commented 3 years ago

Hello! I could not quite understand this document https://github.com/stanfordmlgroup/chexpert-labeler/blob/master/phrases/mention/no_finding.txt What is the information it has? are there more than 12 categories? Thank you in advance!

jirvin16 commented 3 years ago

Hello! I could not quite understand this document https://github.com/stanfordmlgroup/chexpert-labeler/blob/master/phrases/mention/no_finding.txt What is the information it has? are there more than 12 categories? Thank you in advance!

These are the phrases that the labeler searches for when determining "No Finding." If any of the main 12 observations or any the phrases in the no_finding.txt list are found (without being negated), "No Finding" is 0. Otherwise, "No Finding" is 1. So this category was intended to capture the absence of any finding, rather than just the absence of the 12 observations.