Open fcakyon opened 1 year ago
Hi @fcakyon, MeMViT definitely seems interesting and we would be happy to see it added to transformers!
If you haven't done so, you can start by taking a look at our existing video classification models to see if there are any re-usable components you can copy paste and use for MeMViT (preprocessing, model modules, etc.).
The best way to add a new model is to start with the transformers-cli add-new-model
or transformers-cli add-new-model-like
command, which initializes all the model files and ensures the new model can be properly imported. You can learn more about it over here.
Feel free to ping me or @NielsRogge if you get stuck or have questions :)
Thank you for the response @alaradirik. Just covered up the timesformer pr: https://github.com/huggingface/transformers/pull/18908
I will be starting the MeMViT implementation late this week 👍
I am sorry that I won't be able to work on such a PR in the short future due to my time not allowing it. I have a lot of work to do for my Ph.D. If anyone else is willing to work on it, he/she is free to do 👍
Hello, I would like to work upon adding this model
@fcakyon no problem at all :)
@Sandstorm831 sure, please feel free to start working on it, you can ping me or @NielsRogge if you run into issues or have questions about the library in general.
Hi @alaradirik I would like to contribute to this model.
Hi @alaradirik I and @Sandstorm831 are working together towards contributing to this model.
hello, any status update on this? thanks! @alaradirik
sorry for delayed response due to no sustainable progress in work I and @Sandstorm831 are not working on it as of now! @shivanimall you may start working on this issue thank you
Model description
MeMViT, CVPR 2022 is the most efficient transformer-based video understanding model, and META AI released it. Its efficient online attention calculation mechanism decreases computation by 30 times compared to SOTA video classification models.
It would be an excellent addition to the
transformers
library considering it is the current SOTA on AVA, EPIC-Kitchens-100 action classification, and action anticipation datasets.Your contribution
I want to work on adding this architecture to the HuggingFace.
Open source status
Provide useful links for the implementation
Source code: https://github.com/facebookresearch/MeMViT Weight files: https://github.com/facebookresearch/MeMViT#model-checkpoints
cc: @NielsRogge @alaradirik