Control net for video conversion

zyfbis commented 1 year ago

I have observed that there are many videos using SD for animation (such as Rock-Paper-Scissors) that suffer from flickering issues, meaning there are inconsistencies in details between frames. I wonder if it's possible to train a control net to directly address this problem. For example:

During the training phase, the input control condition consists of k+1 consecutive images, where the first k images are extracted from the video's n-k to n-1 frames, and the last image is obtained by preprocessing the nth frame from the video (using tools like OpenPose or HED). The training objective is to restore the original nth frame image.

In the inference phase, the input ontrocl condition's first k images are generated by SD from the n-k to n-1 frames of the video, while the last image is preprocessed from the nth frame of the original video.

Alternatively, we could use a double control net, where one net is dedicated to inputting the previous k frames to control the details.

I hope that this approach can help resolve the flickering issue in SD-generated videos.

lorenmt commented 1 year ago

Check out this work: https://arxiv.org/abs/2304.08818

VladAndronik commented 1 year ago

@lllyasviel Have you tried training the ControlNet on some version of optical flow for video2video applications? There was a discussion where you asked which models are most wanted for the next version and many people suggested this one

geroldmeisinger commented 12 months ago

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet - uses optical flow from videos to generate inpainting masks for frame prediction

lllyasviel / ControlNet

Control net for video conversion #378