ConvNet(F) is different in MTT

Hello! Thank you for pointing out the discrepancy. You are correct—I did not train the expert until full convergence in the initial version of the paper. After training the expert for more epochs, I was able to achieve an expert test accuracy of around 82%. I will update this in the paper accordingly.

For training the expert, I used a learning rate of 1e-3, which differs from MTT’s setting (lr=1e-2). Additionally, there were some minor bugs related to importing libraries in train_expert, which I have just fixed.*

Along with the bug fix, I’ve also added sample commands in the sample_scripts folder to train the CIFAR-10 expert, as well as a command for the soft label baseline.

You can find my trained experts in this Google Drive: link

Note*: The reason to use a smaller learning rate is to get a more granular checkpoints early in training, see Appendix A4.

sunnytqin / no-distillation

ConvNet(F) is different in MTT #1