VideoVAE Inference with video length more than 17 frames

kuzhamuratov commented 2 weeks ago

你好, thank you for your great work! I am testing VideoVAE reconstruction script on long videos (> 17 frames). With 17 frames everything works fine, but if I change number of frames to 33 or bigger in configs/vae/inference/video.py strong visual artifacts appears. Can you help me with addingenable_tiling (sliding window) feature? 谢谢。

https://github.com/hpcaitech/Open-Sora/assets/44082020/7b33d8d2-ccef-4c10-a194-354cf40c6ba2

https://github.com/hpcaitech/Open-Sora/assets/44082020/626dc926-5926-46c7-8937-31da39125636

zhengzangw commented 2 weeks ago

This is consistent with our training, as we write here, we only train VAE with frames less than 34 frames. And during training, we only use num_frames=17 and tiling for larger frames. I think our code already supports this with the micro_frame_size=17 here:

https://github.com/hpcaitech/Open-Sora/blob/9c4444207f18e6cf851e8cbac689f32bef762075/configs/opensora-v1-2/inference/sample.py#L26

kuzhamuratov commented 2 weeks ago

Thank you for quick reply, I see now!

hpcaitech / Open-Sora

VideoVAE Inference with video length more than 17 frames #490