mlfoundations / wise-ft

Robust fine-tuning of zero-shot models
https://arxiv.org/abs/2109.01903
Other
644 stars 68 forks source link

Replicating few-shot results #5

Closed samuelyu2002 closed 2 years ago

samuelyu2002 commented 2 years ago

In Table 7 of the paper, there are results showing Wise-FT with a linear classifier and the ViT/B-16 backbone can get 73% accuracy on a 16-shot imagenet dataset. It was mentioned that the learning rate was 10e-5 and it was trained for 10 epochs, but even with this information, I still cannot replicate the result shown in the paper. I was wondering if I could be provided with an exact command, or additional hyperparameters (e.g. batch size, number of warmup steps, etc.) so that this result can be replicated?

mitchellnw commented 2 years ago

Thanks a lot for the question. In the paper we write 10^{-5} which is actually 1e-5 not 10e-5, hopefully this is the issue! We use the default hparams mentioned of batch size 512 and this aug for training.

samuelyu2002 commented 2 years ago

Thanks!

guozix commented 1 year ago

@mitchellnw Thanks for your reply! I have an additional question: Is the learning rate of learning rate of WiSE-FT(linear classifier) the same with that of WiSE-FT(end-2-end) in Table 7 ? And same question of Figure 16 vs Figure 17.