Closed samuelyu2002 closed 2 years ago
Thanks a lot for the question. In the paper we write 10^{-5} which is actually 1e-5 not 10e-5, hopefully this is the issue! We use the default hparams mentioned of batch size 512 and this aug for training.
Thanks!
@mitchellnw Thanks for your reply! I have an additional question: Is the learning rate of learning rate of WiSE-FT(linear classifier) the same with that of WiSE-FT(end-2-end) in Table 7 ? And same question of Figure 16 vs Figure 17.
In Table 7 of the paper, there are results showing Wise-FT with a linear classifier and the ViT/B-16 backbone can get 73% accuracy on a 16-shot imagenet dataset. It was mentioned that the learning rate was 10e-5 and it was trained for 10 epochs, but even with this information, I still cannot replicate the result shown in the paper. I was wondering if I could be provided with an exact command, or additional hyperparameters (e.g. batch size, number of warmup steps, etc.) so that this result can be replicated?