Open szlbiubiubiu opened 3 years ago
Hi, "patch size 16" for ViT, right?
I think I have tried ViT (patch size 16, dim 512, heads 8, depth 16) before. I made some modification to the output. The output is changed from x[0] to mean(x), then the acc on CFP-FP is about 94%. Note that, I have compared ViT (patch size 8, dim 512, heads 8, depth 16) with different output type x[0] or mean(x), the performance is similar.
Thank you~
Your comment really helps a lot:)
Hi,
Have you ever tried to train the model with patch size 16? And can you please share the performance on CFP-FP dataset?
I tried to train, but the best result is about 92%+ on CFP-FP(not from your repo). So I want to check if it is the problem with my implementation.
Thanks~