Have you tried patch size 16?

zhongyy / Face-Transformer

Face Transformer for Recognition

MIT License

252 stars 54 forks source link

Have you tried patch size 16? #7

Open szlbiubiubiu opened 3 years ago

szlbiubiubiu commented 3 years ago

Hi,

Have you ever tried to train the model with patch size 16? And can you please share the performance on CFP-FP dataset?

I tried to train, but the best result is about 92%+ on CFP-FP(not from your repo). So I want to check if it is the problem with my implementation.

Thanks~

zhongyy commented 3 years ago

Hi, "patch size 16" for ViT, right?

I think I have tried ViT (patch size 16, dim 512, heads 8, depth 16) before. I made some modification to the output. The output is changed from x[0] to mean(x), then the acc on CFP-FP is about 94%. Note that, I have compared ViT (patch size 8, dim 512, heads 8, depth 16) with different output type x[0] or mean(x), the performance is similar.

szlbiubiubiu commented 3 years ago

Thank you~

Your comment really helps a lot:)