Open tudorcebere opened 1 year ago
Hi, we used full-batch linear regression using L-BFGS. For Imagenet with 1M+ images in the training split it was quite slow and requires huge memory especially considering the hyperparameter sweep for the L2 regularization term (C). Some discussions can be found in:
Hello! Thank you for this excellent model & paper!
I am interested in reproducing the linear probing results in the paper for ImageNet (using SGD). Can the authors provide some insights into how they achieve the results in Table 10 of the paper? My attempts using ViT-32 have significantly inferior test time performance, it seems that it fails to learn very badly. I have followed the example
Thank you!