stanfordmlgroup / chexpert-labeler

CheXpert NLP tool to extract observations from radiology reports.
MIT License
340 stars 79 forks source link

CheXpert-v1.0-small/train.csv missing values #9

Closed KeremTurgutlu closed 5 years ago

KeremTurgutlu commented 5 years ago

I am recently exploring data which is shared here in: https://stanfordmlgroup.github.io/competitions/chexpert/. Pathology labels mentioned in the article are either labeled as positive, negative or uncertain but when looking at csv files shared there are also NANs(missing values) present. How should we interpret these missing values? How it was done in the original baseline model? Thanks.

jirvin16 commented 5 years ago

The missing values mean that no mention of the observation was extracted by the labeler in the report. We treat those cases as negative in the model presented in the paper.

KeremTurgutlu commented 5 years ago

Thanks a lot for the clarification :)

jirvin16 commented 5 years ago

Glad to help! :)