jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
2.4k stars 233 forks source link

is there any quick implement for video to video task based on Pyramid-Flow? for example video editing #171

Open huiyan1804 opened 1 week ago

huiyan1804 commented 1 week ago

t2v and i2v are both work well, thanks for your work! is there any chance to implement Pyramin Flow on v2v task?

feifeiobama commented 1 week ago

There is a related pull request movie_editor.py in https://github.com/jy0205/Pyramid-Flow/pull/112 for the video extension task. Although it does not implement video-to-video generation as expected, we believe the task itself should be achievable within our framework.

huiyan1804 commented 1 week ago

There is a related pull request movie_editor.py in #112 for the video extension task. Although it does not implement video-to-video generation as expected, we believe the task itself should be achievable within our framework.

im thinking use the same idea as img2img on video2video. firstly i encode the frames into latent space, add noise according to the scheduler's timestep. 8 frames in a unit, same shape as the latents to be denoising. then i've tried directly replace the noisy latents or add on it,either result bad. i find it difficult to handle the jump points between different resolution. img2img task usually need denoising strength within 0-1, in your original pipeline, noisy latent equals to denoising strength=1,which means not refer to input video. when i set strength <1, for example 0.4, denoising only take the last 40% timesteps. if i set this to all 3 stages, it cuts off the continuity between stages. and if i only perform 0.4 strength in the first low res stage, 1 in the rest 2 stages, the influence of the input video is to week. any good ideas?

feifeiobama commented 1 week ago

To apply our framework to video-to-video generation, you need to downsample and renoise the latent to match the training latent at a certain timestep, and then start inference from that timestep.