zhang-zx / AVID

This respository contains the code for AVID: Any-Length Video Inpainting with Diffusion Model.
https://zhang-zx.github.io/AVID/
MIT License
125 stars 2 forks source link

Video length limiation? #6

Open lumiaomiao opened 1 month ago

lumiaomiao commented 1 month ago

Thanks for great work!

  1. Did you test the limitation of video length? In inference phrase, to use Middle-frame attention guidance, all video clips of a long video need to be denoised together, so that the V[N'/2]could be generated timely. So maybe there is a video length limitation?

  2. Another question about Middle-frame attention guidance: when you choose V[N'/2] as the key frame, it will be invalid if scene changed after V[N'/2], right ?

  3. Is there any update on the code release?

zhang-zx commented 1 month ago

Thanks for your interest in our work and sorry for the late reply.

As we mentioned in the paper, at each denoising step, we first denoise the clip with middle-frame and then use the attention to guide the generation of other clips. The maximum amount of frames we tried was 256 as shown in the supplementary in the any-length text to video generation section. However, we did not tried this on the video inpainting task. This is mainly because our method can not handle inconsistency (e.g. object moving out and moving back in the frame) well. We use the attention of the middle frame to maintain the identity of the generated object and thus it will be invalid if the object disappears (the scene changed part in your question if I understand correctly).

Unfortunately, I haven't heard about the process of the legal review.

Please don't hesitate letting us know if you have any further questions.