keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1k stars 333 forks source link

Add `MoViNet` model #2304

Open innat opened 8 months ago

innat commented 8 months ago

Short Description

MoViNets: Mobile Video Networks for Efficient Video Recognition

Mobile Video Networks (MoViNets) are efficient video classification models runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and efficiency on several large-scale video action recognition datasets.

On Kinetics 600, MoViNet-A6 achieves 84.8% top-1 accuracy, outperforming recent Vision Transformer models like ViViT (83.0%) and VATT (83.6%) without any additional training data, while using 10x fewer FLOPs. And streaming MoViNet-A0 achieves 72% accuracy while using 3x fewer FLOPs than MobileNetV3-large (68%).

Papers

MoViNets

Existing Implementations

Other Information

The streaming version of this model makes it quite impression and it would be valuable addition.

divyashreepathihalli commented 8 months ago

Hi @innat that you for this suggestion. We will keep this open, but at this point this is of low priority for the team.

innat commented 8 months ago

@divyashreepathihalli Thanks for the confirmation. I pulled out the movinet from tf-model garden and maintaining to a dedicated repo (private for now). The codebase somewhat complex due to large number of configurations. I will keep update the codebase, so, please let me know when keras-cv is ready take it.

divyashreepathihalli commented 8 months ago

If you have code ready to go, which works well across all backends. Please feel free to open the PR. We will review it and add it.