microsoft / CvT

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.
MIT License
533 stars 120 forks source link

Regarding the FLOPs #1

Closed zhuchen03 closed 3 years ago

zhuchen03 commented 3 years ago

Hi,

I wonder how you computed the FLOPs. I used FlopCountAnalysis from fvcore.nn and it is larger than you reported.

leoxiaobin commented 3 years ago

We use ptflops to compute the FLOPs, you can get it by specifying MODEL_SUMMARY to True. Could you provide more details of the FLOPs computed by fvcore.nn?

zhuchen03 commented 3 years ago

Mystery solved. I was testing the non-official code from https://github.com/rishikksh20/convolution-vision-transformers, where they did not do the downsampling for the key and values. So the FLOPs from this implementation is higher than yours. Your numbers should be correct. I just wish you had released your code earlier.