hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
3.01k stars 202 forks source link

Memory/speed improvements over DeiT for larger Vim #17

Open karolpustelnik opened 9 months ago

karolpustelnik commented 9 months ago

I find your paper on Vision Mamba very interesting. However, when using your code, I encountered a problem (which may well be normal behavior). When analyzing GPU memory consumption and FPS for Vim versions other than Tiny, I could not achieve similar speed and memory improvements. I compared it to DeiT, and the improvements were only visible in Vim-Ti. Am I doing something wrong, or are the improvements only in the Tiny version?

blameitonme1 commented 1 month ago

Hello, have you found out the reason? I actually test it too and I find that the GPU memory usage is much bigger than that of ViT, which is strage considering the authors' claim that vision mamba saves a lot of GPU memory. Also, the speed is not faster than ViT.