Closed Nikolai10 closed 1 year ago
I apologize for the misunderstanding. In the code, N refers to 1/2 of C in the paper, as it represents the number of channels used to be input into the Tensformer and CNN networks respectively. Therefore, for the small model, N should be set as 64.
Regarding Swin-Charm, since the open-source code was not available when we finished our work, we reproduced the method based on the paper. The difference between our implementation and the open-source code may be the slice transform. Since the paper didn't say the output channels of the middle convolutional layers, we used the same convolutional layers of our method.
Sorry for any confusion caused. I'll update README to clarify both points.
Thanks a lot for your help. Now I get the following values with the Deepspeed Profiler/ get_model_profile():
N=64 - 441.38 G 215.32 GMACs 45.18 M (flops, macs, params)
N=96 - 865.73 G 425.09 GMACs 59.13 M (flops, macs, params)
N=128 - 1454.5 G 717.08 GMACs 76.57 M (flops, macs, params)
The number of parameters is now quite similar to the reported numbers. The Flops however are approximately twice as high compared to the reported numbers. Do you have any idea why? How are you profiling your model?
I also used a RTX 3090 GPU.
Thanks again!
For all methods in table 1, we use flops-counter.pytorch to caculate complexity. Specifically, the Flops we report here should be MACs. For most CV tasks, many papers identify FLOPs with MACs. Some versions of packages also mixed up Flops and MACs .
I think the following reference might be helpful: https://github.com/open-mmlab/mmcv/issues/785#issuecomment-766840386 https://github.com/sovrasov/flops-counter.pytorch/issues/16#issuecomment-802631732 https://github.com/sovrasov/flops-counter.pytorch/blob/1ad0ed1999620c0170e5854dde39805d30d9b6aa/sample.py#L36 https://github.com/Lyken17/pytorch-OpCounter/tree/160004dd1535323d71763c93482d2a8f5f260301
Thank you very much :)
Hello @jmliu206,
thank you very much for providing your interesting work.
Could you please explain in more detail how exactly you calculate the parameter number given in Table 1? According to your paper, your small model should have a parameter count of 44.96M, while I get about 76M when testing your code. I have created a colab to reproduce this result:
https://colab.research.google.com/drive/1KdwoC1i-TYMtc3akyuX83exipynKEE4v?usp=sharing
I have used the default setting with C=128 - probably I am just missing some details here...
I was also a bit surprised by the reported number of model parameters for SwinT-ChARM. According to Zhu et al., they have a total of 32.6M (Table 3), whereas you report 60.55M.
It would be great if you could provide further insights here.
Thanks in advance, Nikolai