keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1.01k stars 331 forks source link

Add `MoViNet` model #2304

Open innat opened 10 months ago

innat commented 10 months ago

Short Description

MoViNets: Mobile Video Networks for Efficient Video Recognition

Mobile Video Networks (MoViNets) are efficient video classification models runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and efficiency on several large-scale video action recognition datasets.

On Kinetics 600, MoViNet-A6 achieves 84.8% top-1 accuracy, outperforming recent Vision Transformer models like ViViT (83.0%) and VATT (83.6%) without any additional training data, while using 10x fewer FLOPs. And streaming MoViNet-A0 achieves 72% accuracy while using 3x fewer FLOPs than MobileNetV3-large (68%).

Papers

MoViNets

Existing Implementations

Other Information

The streaming version of this model makes it quite impression and it would be valuable addition.

divyashreepathihalli commented 10 months ago

Hi @innat that you for this suggestion. We will keep this open, but at this point this is of low priority for the team.

innat commented 10 months ago

@divyashreepathihalli Thanks for the confirmation. I pulled out the movinet from tf-model garden and maintaining to a dedicated repo (private for now). The codebase somewhat complex due to large number of configurations. I will keep update the codebase, so, please let me know when keras-cv is ready take it.

divyashreepathihalli commented 10 months ago

If you have code ready to go, which works well across all backends. Please feel free to open the PR. We will review it and add it.