I added data enhancement methods such as translation, rotation, and scaling to the test data sample, hoping to use the inductive bias of CNN, but R50+ViT did not achieve the expected effect. Under what circumstances will R50+ViT be better than ordinary ViT
I added data enhancement methods such as translation, rotation, and scaling to the test data sample, hoping to use the inductive bias of CNN, but R50+ViT did not achieve the expected effect. Under what circumstances will R50+ViT be better than ordinary ViT