piotr-teterwak / erm_plusplus

MIT License
15 stars 0 forks source link

DiWA implementation #2

Open alexrame opened 1 year ago

alexrame commented 1 year ago

Hello,

I recently had the opportunity to read your ERM++ paper, and I'd like to congratulate you on your excellent work. As the author of the DiWA paper, I noticed a few reasons that could explain why your implementation of "DIWA is unable to outperform ERM++".

It would be helpful to include a simple baseline: DiWA with 5 runs applied to your train/val dataset using your AugMix initialization, hyperparameters with a mild distribution, dropout enabled, unfrozen BN, while averaging only the final weights, and optionally greedy selection.

Best regards, Alexandre

piotr-teterwak commented 1 year ago

Hi Alexandre,

Thank you for the kind note and the suggestion! I will run the suggested setting and report back.

Piotr

piotr-teterwak commented 1 year ago

Here are the updated results.

image

I'll update the arxiv, and then close the issue.

-Piotr