suinleelab / cxr_covid

Code for paper "AI for radiographic COVID-19 detection selects shortcuts over signal"
Other
29 stars 7 forks source link

Update githubcovid.py #2

Closed Nlin2 closed 3 years ago

Nlin2 commented 3 years ago

Error Replication Running python train_models.py --dataset 1 gives the following Error

Traceback (most recent call last):                                                                                                                    
  File "train_covid.py", line 202, in <module>
    main()
  File "train_covid.py", line 193, in main
    train_githubcxr14(args.seed, 
  File "train_covid.py", line 52, in train_githubcxr14
    classifier.train(trainds,
  File "/uss/xrai/nick_folder/cxr_covid/models/cxrclassifier.py", line 178, in train
    valloss, valauroc = self._val_epoch(val_dataloader)
  File "/uss/xrai/nick_folder/cxr_covid/models/cxrclassifier.py", line 262, in _val_epoch
    auroc = sklearn.metrics.roc_auc_score(true[:,-1], probs[:,-1])
  File "/datasets/home/00/300/nil021/.conda/envs/cxr_covid/lib/python3.8/site-packages/sklearn/metrics/_ranking.py", line 387, in roc_auc_score
    return _average_binary_score(partial(_binary_roc_auc_score,
  File "/datasets/home/00/300/nil021/.conda/envs/cxr_covid/lib/python3.8/site-packages/sklearn/metrics/_base.py", line 77, in _average_binary_score
    return binary_metric(y_true, y_score, sample_weight=sample_weight)
  File "/datasets/home/00/300/nil021/.conda/envs/cxr_covid/lib/python3.8/site-packages/sklearn/metrics/_ranking.py", line 221, in _binary_roc_auc_score
    raise ValueError("Only one class present in y_true. ROC AUC score "
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

Problem Identification Looking at the metadata, we see that values in finding columns may have been updated to new values. Github-COVID feature engineering datasets/githubcovid.py needs to be updated Current solution gives false for every datapoint, because of line 71: covid_set = ['COVID-19','COVID-19, ARDS']

Solution Patients w/ COVID now have the following string 'Pneumonia/Viral/COVID-19' instead of 'COVID-19','COVID-19, ARDS'] Pneumonia patients and healthy set must also be updated to correspond with the new changes

jjanizek commented 3 years ago

Thanks for catching this issue!

In order to replicate the experiments in our paper, I think the easiest solution is to use the exact version of the Github-COVID dataset that was available when we were running experiments. I've updated the README to reflect that after cloning the Github-COVID repo, you can use the command git checkout 9b9c2d5 to get the version of this dataset where patients with COVID are still labeled as either 'COVID-19' or 'COVID-19, ARDS'.