MobileViT is a computer vision model that combines CNNs with transformers that has already been added to Transformers.
MobileViT v2 is the second version; it is constructed by replacing multi-headed self-attention in MobileViT v1 with the proposed separable self-attention.
Does Hugging Face have plan to add MobileViT v2 to Transformers?
Model description
MobileViT is a computer vision model that combines CNNs with transformers that has already been added to Transformers.
MobileViT v2 is the second version; it is constructed by replacing multi-headed self-attention in MobileViT v1 with the proposed separable self-attention.
Does Hugging Face have plan to add MobileViT v2 to Transformers?
Open source status
Provide useful links for the implementation
The official implementation is from Apple at this link: https://github.com/apple/ml-cvnets
The timm library also implemented it and has pre-trained weights at this link: https://github.com/huggingface/pytorch-image-models/blob/82cb47bcf360e1974c00c35c2aa9e242e6b5b565/timm/models/mobilevit.py