huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.76k stars 26.95k forks source link

Add MobileViT v2 #22570

Open SunHaozhe opened 1 year ago

SunHaozhe commented 1 year ago

Model description

MobileViT is a computer vision model that combines CNNs with transformers that has already been added to Transformers.

MobileViT v2 is the second version; it is constructed by replacing multi-headed self-attention in MobileViT v1 with the proposed separable self-attention.

Does Hugging Face have plan to add MobileViT v2 to Transformers?

Open source status

Provide useful links for the implementation

The official implementation is from Apple at this link: https://github.com/apple/ml-cvnets

The timm library also implemented it and has pre-trained weights at this link: https://github.com/huggingface/pytorch-image-models/blob/82cb47bcf360e1974c00c35c2aa9e242e6b5b565/timm/models/mobilevit.py

shehanmunasinghe commented 1 year ago

Hi @SunHaozhe , I would like to work on implementing this model.