sunnytqin / no-distillation

9 stars 0 forks source link

ConvNet(F) is different in MTT #1

Open Kciiiman opened 1 week ago

Kciiiman commented 1 week ago

Hello sir. Why the result about the whole dataset training is 84.8 in MTT, but in your paper is 75. Can you tell me roughly configuration about training the whole dataset, that would be really great.

sunnytqin commented 1 week ago

Hello! Thank you for pointing out the discrepancy. You are correct—I did not train the expert until full convergence in the initial version of the paper. After training the expert for more epochs, I was able to achieve an expert test accuracy of around 82%. I will update this in the paper accordingly.

For training the expert, I used a learning rate of 1e-3, which differs from MTT’s setting (lr=1e-2). Additionally, there were some minor bugs related to importing libraries in train_expert, which I have just fixed.*

Along with the bug fix, I’ve also added sample commands in the sample_scripts folder to train the CIFAR-10 expert, as well as a command for the soft label baseline.

You can find my trained experts in this Google Drive: link

Note*: The reason to use a smaller learning rate is to get a more granular checkpoints early in training, see Appendix A4.