qianqianwang68 / omnimotion

Apache License 2.0
2.07k stars 121 forks source link

Tracking Everything Everywhere All at Once

PyTorch Implementation for paper Tracking Everything Everywhere All at Once, ICCV 2023.

Qianqian Wang 1,2, Yen-Yu Chang 1, Ruojin Cai 1, Zhengqi Li 2, Bharath Hariharan 1, Aleksander Holynski 2,3, Noah Snavely 1,2
1Cornell University, 2Google Research, 3UC Berkeley

Project Page | Paper | Video

Installation

The code is tested with python=3.8 and torch=1.10.0+cu111 on an A100 GPU.

git clone --recurse-submodules https://github.com/qianqianwang68/omnimotion/
cd omnimotion/
conda create -n omnimotion python=3.8
conda activate omnimotion
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install matplotlib tensorboard scipy opencv-python tqdm tensorboardX configargparse ipdb kornia imageio[ffmpeg]

Training

  1. Please refer to the preprocessing instructions for preparing input data for training OmniMotion. We also provide some processed data that you can download, unzip and directly train on. (Note that depending on the network speed, it may be faster to run the processing script locally than downloading the processed data).

  2. With processed input data, run the following command to start training:

    python train.py --config configs/default.txt --data_dir {sequence_directory}

    You can view visualizations on tensorboard by running tensorboard --logdir logs/. By default, the script trains 100k iterations which takes 8~9h on an A100 GPU and 12-13h on RTX4090.

If you want to skip the optimization and see what the results/formats look like, we provide the weights for a few sequences here. You can use viz.py to visualize the correspondences produced by the models. Please refer to the next section for more details.

Visualization

The training pipeline generates visualizations (correspondences, pseudo-depth maps, etc) every certain number of steps (saved in args.out_dir/vis). You can also visualize grid points / trails after training by running:

python viz.py --config configs/default.txt --data_dir {sequence_directory}

Make sure expname and data_dir are correctly specified, so that the model and data can be loaded. By specifying expname, the latest checkpoints that match that expname will be loaded. Alternatively, you can specify ckpt_path to select a particular checkpoint.

To generate the motion trail visualization, foreground/background segmentation mask is required. For DAVIS videos one can just use the mask annotations provided by the dataset. For custom videos that don't come with foreground segmentation masks, you can use remove.bg to remove the background for the query frame, download the masked image and set foreground_mask_path to its path. Here is an example of the masked image for the first frame of the butterfly sequence.

python viz.py --config configs/default.txt --data_dir {sequence_directory} --foreground_mask_path {mask_file_path}

If you download the provided model weights for a sequence from here, you can visualize the correspondences by running the viz.py script and setting data_dir to the unzipped directory, ckpt_path to the path for model_100000.pth in the directory, and optionally foreground_mask_pathas the path to mask_0.png (only required for non-DAVIS sequences butterfly, kangaroo, and swing_tire if you want to visualize their motion trails).

Troubleshooting

Citation

@article{wang2023omnimotion,
    title   = {Tracking Everything Everywhere All at Once},
    author  = {Wang, Qianqian and Chang, Yen-Yu and Cai, Ruojin and Li, Zhengqi and Hariharan, Bharath and Holynski, Aleksander and Snavely, Noah},
    journal = {ICCV},
    year    = {2023}
}