Open immortalCO opened 1 month ago
Hey, thanks for reporting! We've come across this issue as well. This comes from maintaining 1:1 implementations with the original CogVideo code base.
See this and this. I think @yiyixuxu was looking into this.
I think what you mention is correct and creates the intended cosine guidance schedule. cc @zRzRzRzRzRzRzR as well for verifying this
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Gentle ping to @yiyixuxu. I think we should fix this issue in our pipelines, even if it is incompatible with original implementation.
Describe the bug
As shown at pipeline_cogvideox_image2video.py L778, pipeline_cogvideox_video2video.py L778, and pipeline_cogvideox.py L697, the dynamic CFG is calculated in this way:
However:
num_inference_steps
is the number of inference denoising steps, which is default to 50.t.item()
is the denoising timesteps, which range from 1 to 999.((num_inference_steps - t.item()) / num_inference_steps
is not from 1 to 0, but in fact goes to negative very fast. And after** 5.0
, themath.cos
will have very severe fluctuations.I wonder: is this really the desired behavior of the CogVideoX pipeline? Shouldn't it be one of the following:
Both implementations will make the dynamic CFG like a cosine annealing.
Also, I think here
1 + guidance_scale * (...)
should be1 + (guidance_scale - 1) * (...)
, otherwise its value will be 1 ~ 1 + CFG instead of 1 ~ CFG.Please check it and fix it if it is really a bug, thank you very much.
Reproduction
Logs
System Info
This is a bug in the code agnostic to system.
Who can help?
@DN6 @a-r-r-o-w @zRzRzRzRzRzRzR