wenqsun / DimensionX

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Apache License 2.0
913 stars 54 forks source link

CogVideoX1.5-5B-I2V: The problem of consistency in video generation at any resolution #24

Open Alfonsobang opened 3 days ago

Alfonsobang commented 3 days ago

First of all, I would like to say that the work you are doing is fantastic, highly practical and far-reaching. Here is one of my questions: I used this beta version of comfyui node: https://github.com/kijai/ComfyUI-CogVideoXWrapper/tree/1.5_test When I use the following two models, very different results appear for the same image : THUDM/CogVideoX-5b-I2V:When used with this lora at a fixed resolution (720*480): orbit_left_lora_weights.safetensors, the consistency of the video is maintained very well. kijai/CogVideoX-5b-1.5-I2V:Used with this lora: orbit_left_lora_weights.safetensors, the consistency of the video is very poor. This looks like a version update that has caused a deviation in the parameters used for this LORA, is there a solution? Thank you for your help.

chenshuo20 commented 2 days ago

Hi, thank you for your interest! The issue mainly arises because we trained our Orbit LoRA on CogVideoX-5b-I2V instead of CogVideoX-5b-1.5-I2V. Directly transferring it may indeed cause some issues. We are considering training future versions of Orbit LoRAs using CogVideoX-5b-1.5-I2V as the base model.

cheezecrisp commented 1 day ago

Hi, thank you for your interest! The issue mainly arises because we trained our Orbit LoRA on CogVideoX-5b-I2V instead of CogVideoX-5b-1.5-I2V. Directly transferring it may indeed cause some issues. We are considering training future versions of Orbit LoRAs using CogVideoX-5b-1.5-I2V as the base model.

This is really good to look forward to. BTW, any plan to train zoom in or zoom out lora?

chenshuo20 commented 1 day ago

Certainly! This is already on our to-do list, and we'll be releasing them soon.

kijai commented 1 day ago

I would like to add that the orbit LoRAs work fantastically well with the CogVideoX-Fun 5b models as well. Which can be used as alternative for 1.5 if you want to use other resolutions/aspect ratios.

On 1.5 the LoRAs do work, just much weaker. Generally still well enough if you also prompt for the orbit.