stanfordmlgroup / chexpert-labeler

CheXpert NLP tool to extract observations from radiology reports.
MIT License
340 stars 79 forks source link

Why is "lesion" not included as a keyword/synonym to lung_lesion finding? #26

Closed tinahuang222 closed 3 years ago

jirvin16 commented 3 years ago

We decided that "lesion" was too imprecise for this category. But it's worth testing this out on your own dataset of reports!

tinahuang222 commented 3 years ago

Do you mind posting the annotation guidelines for gold-label human annotators? That will help a lot in clarifying the assumptions made by annotators.

Thanks in advance.

tinahuang222 commented 3 years ago

We decided that "lesion" was too imprecise for this category. But it's worth testing this out on your own dataset of reports!

When you say imprecise, does that mean there's imprecise mapping to the visual characteristics, or do you mean the definition of lesion/lung lesion is imprecise?

jirvin16 commented 3 years ago

We decided that "lesion" was too imprecise for this category. But it's worth testing this out on your own dataset of reports!

When you say imprecise, does that mean there's imprecise mapping to the visual characteristics, or do you mean the definition of lesion/lung lesion is imprecise?

That the word "lesion" would match findings beyond mass and nodule, which is what the lung lesion category was intended to represent. You could probably design unmention phrases to handle this though, so worth exploring!

jirvin16 commented 3 years ago

Do you mind posting the annotation guidelines for gold-label human annotators? That will help a lot in clarifying the assumptions made by annotators.

Thanks in advance.

This was the main set of directions used:

Categories There are 14 total categories to label for:

  1. No Finding
  2. Enlarged Cardiomediastinum
  3. Cardiomegaly
  4. Lung Lesion (Mass/Nodule)
  5. Airspace Opacity
  6. Edema
  7. Consolidation
  8. Pneumonia
  9. Atelectasis
  10. Pneumothorax
  11. Pleural Effusion
  12. Pleural Other (Pleural Thickening, Fibrosis)
  13. Fracture
  14. Support Devices (Tubes/Lines/Hardware/Pacer/Defibrillator)

Please ignore granulomas and calcified nodules. These are the sub/supercategories: Airspace Opacity Enlarged Cardiomediastinum Atelectasis Consolidation Edema Cardiomegaly Pneumonia

(formatting here was dropped but see the hierarchy in the paper)

Labels For the “No Finding” category, only label 1 if the impression does not mention any abnormality whatsoever, more than just the 13 abnormalities in the list above (excluding support devices). However, if it is unclear whether the report is normal (for example, it states “no change from previous”), label u. Otherwise, please leave blank.

For categories 2-14, choose one of the 3 options (ONLY IF the abnormality is explicitly mentioned in the report) 0: Confidently Absent Examples: "No pneumothorax", "Without evidence of focal consolidation" 1: Confidently Present Examples: "likely representing", "suggestive of", "there is" u: Uncertain Examples: "may represent", "could represent", "possibly", "cannot exclude" If the abnormality is not explicitly mentioned, please leave blank. By explicitly mentioned, we mean phrases that are synonymous with the abnormality.

If an abnormality is mentioned, it MUST have a label. All blank entries will be converted to 0 when computing metrics, but we need to differentiate between blank and 0 to measure how well our labeler can extract mentions from the reports. If you are unsure about the label of a report, please highlight it and discuss with the other radiologist.

tinahuang222 commented 3 years ago

Thank you so much, I really appreciate your help!