add MaxViT [TF] - Githubissues

awsaf49 commented 1 year ago

MaxViT: Multi-Axis Vision Transformer is one of the nice papers of late 2022 which is also published in ECCV 2022 by Google AI.

This paper introduces a new attention module called "multi-axis attention" which consists of blocked local and sparse global attention for efficient and scalable spatial interactions on arbitrary input resolutions.
It demonstrates superior performance on various vision tasks including image classification, object detection, and so on.

I think it would be nice to have it on Hugging Face. I would be happy to contribute it on Hugging face.

cc: @alara @NielsRogge

Code and Weights:

Tanishq-01 commented 1 year ago

are you working on this? @awsaf49

awsaf49 commented 1 year ago

Yes

huggingface / transformers