siyuanliii / masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
https://matchinganything.github.io
Apache License 2.0
790 stars 41 forks source link
self-supervision tracking-everything

Image

Matching Anything By Segmenting Anything [CVPR24 Highlight]

[ Project Page ] [ ArXiv ]

Computer Vision Lab, ETH Zurich

Image

News and Updates

Overview

This is a repository for MASA, a universal instance appearance model for matching any object in any domain. MASA can be added atop of any detection and segmentation models to help them track any objects they have detected.

Image

Introduction

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings. We propose MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM), MASA learns instance-level correspondence through exhaustive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains. Extensive tests on multiple challenging MOT and MOTS benchmarks indicate that the proposed method, using only unlabeled static images, achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences, in zero-shot association.

Results on Open-vocabulary MOT Benchmark

Method Base Novel model
TETA AssocA TETA AssocA
OVTrack (CVPR23) 35.5 36.9 27.8 33.6 -
MASA-R50 🔥 46.5 43.0 41.1 42.7 HF🤗
MASA-Sam-vitB 47.2 44.5 41.4 42.3 HF🤗
MASA-Sam-vitH 47.5 45.1 40.5 40.5 HF🤗
MASA-Detic 47.7 44.1 41.5 41.6 HF🤗
MASA-GroundingDINO 🔥 47.3 44.7 41.9 44.0 HF🤗

Model Zoo

Check out our model zoo for more detailed benchmark performance for different models.

Benchmark Testing

If you want to test our tracker on standard benchmarks, please refer to the benchmark_test.md.

More results

See more results on our project page!

Installation

Please refer to INSTALL.md

Demo Run

Preparation

  1. First, create a folder named saved_models in the root directory of the project. Then, download the following models and put them in the saved_models folder.

    a). Download the MASA-GroundingDINO and put it in saved_models/masa_models/gdino_masa.pth folder.

  2. (Optional) Second, download the demo videos and put them in the demo folder. We provide two short videos for testing (minions_rush_out.mp4 and giraffe_short.mp4). You can download more demo videos here.

  3. Finally, create the demo_outputs folder in the root directory of the project to save the output videos.

Demo 1:

Image

python demo/video_demo_with_text.py demo/minions_rush_out.mp4 --out demo_outputs/minions_rush_out_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts "yellow_minions" --score-thr 0.2 --unified --show_fps

The hyperparameters of the tracker can be found in corresponding config files such as configs/masa-gdino/masa_gdino_swinb_inference.py. Current ones are set for the best performance on the demo video. You can adjust them according to your own video and needs.

Demo 2:

Image

Download the sora_fish_10s.mp4 and put it in the demo folder.

python demo/video_demo_with_text.py demo/sora_fish_10s.mp4 --out demo_outputs/msora_fish_10s_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts "fish"  --score-thr 0.1 --unified --show_fps

Demo 3 (with Mask):

Image

a). Download SAM-H weights and put it in saved_models/pretrain_weights/sam_vit_h_4b8939.pth folder.

b). Download the carton_kangaroo_dance.mp4 and put it in the demo folder.

python demo/video_demo_with_text.py demo/carton_kangaroo_dance.mp4 --out demo_outputs/carton_kangaroo_dance_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts "kangaroo" --score-thr 0.4 --unified --show_fps --sam_mask

Plug-and-Play MASA Tracker

You can directly use any detector along with our different MASA variants to track any object.

Demo with YOLOX detector:

Here is an example of how to use the MASA adapter with the YoloX detector pretrained on COCO.

Download the YoloX COCO detector weights from here and put it in the saved_models/pretrain_weights/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth.

Download the MASA-R50 or MASA-GroundingDINO weights and put it in the saved_models/masa_models/.

Demo 1:

Image

Run the demo with the following command (change the config and checkpoint path accordingly if you use different detectors or masa models):

python demo/video_demo_with_text.py demo/giraffe_short.mp4 --out demo_outputs/giraffe_short_outputs.mp4 --det_config projects/mmdet_configs/yolox/yolox_x_8xb8-300e_coco.py --det_checkpoint saved_models/pretrain_weights/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.3 --show_fps

Demo with CO-DETR detector:

Here are examples of how to use the MASA adapter with the CO-DETR detector pretrained on COCO.

Download the CO-DETR-R50 COCO detector weights from here and put it in the saved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth.

Demo 1:

Image

Download the driving_10s.mp4 and put it in the demo folder.

python demo/video_demo_with_text.py demo/driving_10s.mp4 --out demo_outputs/driving_10s_outputs.mp4 --det_config projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py --det_checkpoint saved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.3 --show_fps

Demo 2:

Image

Download the zebra-drone.mp4 and put it in the demo folder.

python demo/video_demo_with_text.py demo/zebra-drone.mp4 --out demo_outputs/zebra-drone_outputs.mp4 --det_config projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py --det_checkpoint saved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.2 --show_fps

Roadmaps:

Here are some of the things we are working on and please let us know if you have any suggestions or requests:

Limitations:

MASA is a universal instance appearance model that can be added atop of any detection and segmentation models to help them track any objects they have detected. However, there are still some limitations:

Contact

For questions, please contact the Siyuan Li.

Official Citation

@article{masa,
  author    = {Li, Siyuan and Ke, Lei and Danelljan, Martin and Piccinelli, Luigi and Segu, Mattia and Van Gool, Luc and Yu, Fisher},
  title     = {Matching Anything By Segmenting Anything},
  journal   = {CVPR},
  year      = {2024},
}

Acknowledgments

The authors would like to thank: Bin Yan for helping and discussion; Our code is built on mmdetection, OVTrack, TETA, yolo-world. If you find our work useful, consider checking out their work.