Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
Hello, I am truly amazed at the job you have done! And I am trying to build my ideas upon your code.
While I was following well through your codes, I had to come up with a minor question regarding your codes in vision_mamba/model.py Line 96.
I can see from the page 4 of the original paper Algorithm 1 Line 3 that there are two different linear layers that process x and z. While your code suggests those two vectors be forwarded through the same layer.
Though I find it a very trivial one to tell you, can you explain more upon that? Thanks
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
Hello, I am truly amazed at the job you have done! And I am trying to build my ideas upon your code.
While I was following well through your codes, I had to come up with a minor question regarding your codes in vision_mamba/model.py Line 96.
I can see from the page 4 of the original paper Algorithm 1 Line 3 that there are two different linear layers that process x and z. While your code suggests those two vectors be forwarded through the same layer.
Though I find it a very trivial one to tell you, can you explain more upon that? Thanks
Upvote & Fund