Issues regarding the initial linear layer for x and z

kyegomez / VisionMamba

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images

MIT License

363 stars 19 forks source link

Hello, I am truly amazed at the job you have done! And I am trying to build my ideas upon your code.

While I was following well through your codes, I had to come up with a minor question regarding your codes in vision_mamba/model.py Line 96.

I can see from the page 4 of the original paper Algorithm 1 Line 3 that there are two different linear layers that process x and z. While your code suggests those two vectors be forwarded through the same layer.

Though I find it a very trivial one to tell you, can you explain more upon that? Thanks

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

kyegomez / VisionMamba

Issues regarding the initial linear layer for x and z #25

Upvote & Fund