Add MobileViT v2 - Githubissues

Model description

MobileViT is a computer vision model that combines CNNs with transformers that has already been added to Transformers.

MobileViT v2 is the second version; it is constructed by replacing multi-headed self-attention in MobileViT v1 with the proposed separable self-attention.

Does Hugging Face have plan to add MobileViT v2 to Transformers?

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

The official implementation is from Apple at this link: https://github.com/apple/ml-cvnets

The timm library also implemented it and has pre-trained weights at this link: https://github.com/huggingface/pytorch-image-models/blob/82cb47bcf360e1974c00c35c2aa9e242e6b5b565/timm/models/mobilevit.py

huggingface / transformers

Add MobileViT v2 #22570

Model description

Open source status

Provide useful links for the implementation