Closed roymiles closed 6 months ago
Thanks @roymiles for the updates!
Did you use 3 GPUs for distributed training? If nos, I will remove the section from README later, when I make minor changes. https://github.com/yoshitomo-matsubara/torchdistill/pull/446/files#diff-1ecd33e0a6aeb10ddebfcdc6ed245a3e8ea60e38a09ed8974047a3101ec638aeR41-R53
Ah oops, I must have overlooked that. Yea I only used 1 GPU.
No problem, I will merge this PR and make some modifications. The next version of torchdistill will be released in a few days, and I will upload the checkpoint and log as part of the release note for backup.
It's a great job! Thanks for your contribution!
I have reproduced the results in the original paper. The original paper reports an accuracy of 71.63%, while this config leads to 71.65%.
The log and checkpoint for this run can be found here: https://drive.google.com/drive/folders/18xl0CDZ6CioP4Sbjdpj1Pndp4biSLpnV?usp=sharing
Trained on a single GPU.