thu-ml / controlvideo

Official implementation for "ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing"
Apache License 2.0
214 stars 15 forks source link

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

This is the official implementation for "ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing". The project page is available here. Code will be released soon.

Overview

ControlVideo incorporates visual conditions for all frames to amplify the source video's guidance, key-frame attention that aligns all frames with a selected one and temporal attention modules succeeded by a zero convolutional layer for temporal consistency and faithfulness. The three key components and corresponding fine-tuned parameters are designed by a systematic empirical study. Built upon the trained ControlVideo, during inference, we employ DDIM inversion and then generate the edited video using the target prompt via DDIM sampling. image

Main Results

image

To Do List