kyegomez / VisionMamba

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
https://discord.gg/GYbXvDGevY
MIT License
363 stars 19 forks source link

Issues regarding the initial linear layer for x and z #25

Open infected4098 opened 3 months ago

infected4098 commented 3 months ago

Hello, I am truly amazed at the job you have done! And I am trying to build my ideas upon your code.

While I was following well through your codes, I had to come up with a minor question regarding your codes in vision_mamba/model.py Line 96.

I can see from the page 4 of the original paper Algorithm 1 Line 3 that there are two different linear layers that process x and z. While your code suggests those two vectors be forwarded through the same layer.

Though I find it a very trivial one to tell you, can you explain more upon that? Thanks

Upvote & Fund

Fund with Polar

github-actions[bot] commented 3 months ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

github-actions[bot] commented 4 weeks ago

Stale issue message