khaledsaab / spatial_specificity

Apache License 2.0
3 stars 1 forks source link

ISC Dataset Results Reproducibility #1

Open rahulv54 opened 1 year ago

rahulv54 commented 1 year ago

Hi, I trained the ISC model a couple of times, but my validation AUROC is only around 2.8. Am I doing something wrong? Training seems to be fine mostly.

khaledsaab commented 1 year ago

Hi Rahul, happy to help debug the issue. A couple questions:

rahulv54 commented 1 year ago

Khaled, Thanks for the prompt response.

It works perfectly well on validation dataset. (because of unchanged correlations?)

The numbers that I mentioned are from the test dataset, which corresponds to robust AUROC in this case.

Here is the config I passed as an argument to train.py:

model: model_name: resnet arch: resnet50 dropout: 0 pretrained: True resume_ckpt: False

train: seed: 1 # seeds for isic need to be in [1,5] model_type: "resunet" # ["resnet50, resunet"] method: "erm" # ["erm", "seg"] binary_weight: 0 epochs: 100 batch_size: 16 lr: 5e-4 wd: 0 valid_split: val model_id: null

dataset: source: "isic" # options: {"cxr_p", "isic"} sample_ratio: 1 num_workers: 4 id_column: "id" input_column: "input" augmentation: True

wandb: project: domino group: '' log_model: False

I tried the experiment with resnet50 too with similar results.

rahulv54 commented 1 year ago

Aside, thanks for sharing the code. I learnt so much about various new libraries too!

khaledsaab commented 1 year ago

Hey Rahul, apologies for the delayed response. Glad you enjoyed the code!

I found that the following can make a good difference:

For example, maybe try this: python -m train train.method=erm train.model_type=resnet50 dataset.source=isic train.epochs=5 train.wd=0.01. That should get you to the 30's on the robust AUC for the test.

Also, I am assuming you are using the notebook isic_evaluation.ipynb to do the final evaluation? Just want to make sure we're comparing similar code.

I think the main message here is that just using the image-level binary labels performs significantly worse versus supervising with pixel-level labels (i.e., segmentation). Let me know if you are able to reproduce this main result, where doing segmentation improves robust AUROC to the high 70s range.