huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.19k stars 26.6k forks source link

Add [VMamba] model #28606

Open dmus opened 8 months ago

dmus commented 8 months ago

Model description

VMamba is a visual foundation model proposed in https://arxiv.org/pdf/2401.10166.pdf.

It is inspired by the recent advances in state stace models and in particular Mamba. The proposed architecture is computationally more efficient than vision transformer architectures because it scales linearly with growing resolution. It introduces a Cross-Scan Module (CSM) to have context from all directions (4 directions, starting in each corner and traversing in a horizontal or vertical direction). Evaluation on vision perception tasks shows promising capabilities.

Model weights will become available in a few days according to the repo of the authors.

  1. [x] (Optional) Understood theoretical aspects

  2. [x] Prepared transformers dev environment

  3. [x] Set up debugging environment of the original repository

  4. [x] Created script that successfully runs forward pass using original repository and checkpoint

  5. [x] Successfully opened a PR and added the model skeleton to Transformers

  6. [x] Successfully converted original checkpoint to Transformers checkpoint

  7. [x] Successfully ran forward pass in Transformers that gives identical output to original checkpoint

  8. [x] Finished model tests in Transformers

  9. [ ] Successfully added Tokenizer in Transformers

  10. [x] Run end-to-end integration tests

  11. [x] Finished docs

  12. [ ] Uploaded model weights to the hub

  13. [x] Submitted the pull request for review

  14. [ ] (Optional) Added a demo notebook

I am opening the issue to avoid duplicate work. My main motivation for porting this model is to learn a bit more about it (and about the internals of 🤗 Transformers). Some of you probably know this library much better than me, so feel free to write your own implementation if you can do it better or quicker. Otherwise, don’t hesitate to build on top of my fork.

Open source status

Provide useful links for the implementation

MzeroMiko commented 8 months ago

Thank you for your attention. I am one of the authors of VMamba. We have just renewed the repo with code easier to transplanting. I hope this would helps you in your splendid work!