Closed forever10086 closed 2 years ago
Hi, @forever10086. Please refer to Fig C.7.c. The figure shows that the magnitude of the Hessian eigenvalue of ViT-S is even smaller than that of ResNet-50. I do not have results for ViT-B; but I guess the magnitude of the Hessian eigenvalue of ViT-B will be smaller because ViT-B has more heads and higher embedding dim. Please also refer to Fig C.5 and Fig C.6.
ok,i got it,but the number and the dimension of per head in Fig C.5 and Fig C.6 is from the same model like vit-s?i'm not sure the effect of lager heads and demension can resist the effect of bigger model like vit-base. besides,i know the vit matrix eigenvalue is more aggregated than resnet from the specturm,so i guess the bigger model could follow this priciple.
You're right. As you might expect, Fig C.5 and C.6 report results for models with various head numbers/embed dims. All other hyperparameters are the same. I would like to leave a detailed investigation of large models for future work.
ok.it is just a little thing.i think this paper very good,i like it
Thank you for the kind words :)
hello,i have aquestion about why you use vit-s and vit-tiny,and counterpart is resnet-50,these size is not equal.i know you have explained on openview,i want to know whether vit-base's matrix eigenvalue spectrum is like vit-tiny in your paper,just stretch to the right.