zibojia / COCOCO

Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility.
https://zibojia.github.io
274 stars 7 forks source link
cococo diffusion inpainting pytorch sam2 segment segment-anything text-guided text-guided-video-inpainting video-inpainting video-inpainting-with-prompt video-sam2-inpaint

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

*Bojia Zi1, Shihao Zhao2, [Xianbiao Qi5](https://scholar.google.com/citations?user=odjSydQAAAAJ&hl=en), Jianan Wang4, Yukai Shi3, Qianyu Chen1, Bin Liang1, Rong Xiao5, Kam-Fai Wong1, Lei Zhang4**

* is corresponding author.

This is the inference code for our paper CoCoCo.

COCOCO

Orginal The ocean, the waves ... The ocean, the waves ...
Orginal The river with ice ... The river with ice ...
Orginal Meteor streaking in the sky ... Meteor streaking in the sky ...

Table of Contents

Features

Installation

Step1. Installation Checklist

Before install the dependencies, you should check the following requirements to overcome the installation failure.

Step2. Install the requirements

If you update your enviroments successfully, then try to install the dependencies by pip.

  # Install the CoCoCo dependencies
  pip3 install -r requirements.txt
  # Compile the SAM2
  pip3 install -e .

If everything goes well, I think you can turn to the next steps.

Usage

1. Download pretrained models.

Note that our method requires both parameters of SD1.5 inpainting and cococo.

2. Prepare the mask

You can obtain mask by GroundingDINO or Track-Anything, or draw masks by yourself.

We release the gradio demo to use the SAM2 to implement Video Inpainting Anything. Try our Demo!

DEMO

3. Run our validation script.

By running this code, you can simply get the video inpainting results.

  python3 valid_code_release.py --config ./configs/code_release.yaml \
  --prompt "Trees. Snow mountains. best quality." \
  --negative_prompt "worst quality. bad quality." \
  --guidance_scale 10 \ # the cfg number, higher means more powerful text controlability
  --video_path ./images/ \ # the path that store the video and masks, the format is the images.npy and masks.npy
  --model_path [cococo_folder_name] \ # the path to cococo weights, e.g. ./cococo_weights
  --pretrain_model_path [sd_folder_name] \ # the path that store the pretrained stable inpainting model, e.g. ./stable-diffusion-v1-5-inpainting
  --sub_folder unet # set the subfolder of pretrained stable inpainting model to get the unet checkpoints

4. Personalized Video Inpainting (Optional)

We give a method to allow users to compose their own personlized video inpainting model by using personalized T2Is WITHOUT TRAINING. There are three steps in total:

Convert safetensors to Pytorch weights

Take Pytorch weights and add them on CoCoCo to create personalized video inpainting

5. COCOCO INFERENCE with SAM2

TO DO


[1]. We will use larger dataset with high-quality videos to produce a more powerful video inpainting model soon.

[2]. The training code is under preparation.

Citation


@article{Zi2024CoCoCo,
  title={CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility},
  author={Bojia Zi and Shihao Zhao and Xianbiao Qi and Jianan Wang and Yukai Shi and Qianyu Chen and Bin Liang and Kam-Fai Wong and Lei Zhang},
  journal={ArXiv},
  year={2024},
  volume={abs/2403.12035},
  url={https://arxiv.org/abs/2403.12035}
}

Acknowledgement

This code is based on AnimateDiff, Segment-Anything-2 and propainter.