Closed jacobswan1 closed 4 years ago
@jacobswan1 no problem! I am actually back to work on attention networks, so I won't be expanding on this repository any further for a while. but I will refer you to Janne Spijkervet's excellent repository, where she has code setup for self-supervised training for Cifar-10 https://github.com/Spijkervet/BYOL
Thanks so much for your quick response!
Actually I've spent a long time replicating the downstream results on CIFAR and all the other classification tasks and was really stuck in this step. And just want to double-check with you, will the use of LBFGS yield a huge performance difference with Adam or SGD optimizer for the logistic regression training? cause I saw Spijkervet's implementation was using Adam as the optimizer. And if there are any other details I need to notice for the training please let me know that.
Thanks!
@jacobswan1 I haven't had the spare computing power to reproduce any of the results yet. If you do figure it out, please consider making a pull request with a training script! My impression from reading the paper is that the optimizer choice isn't as crucial as the augmentations used as well as the hyperparameters around the exponential moving average of the target encoder, but I could be wrong
Appreciated it!
Thanks for your open sourcing!
I notice that the BYOL has a large gap on the transferring downstream datasets: e.g., SimCLR reaches 71.6% on Cifar 100, while BYOL can reach to 78.4%.
I understand that this might depends on the downstream training protocols. And could you provide us a sample code on that, especially for the LBFGS optimized logistic regressor?