huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.99k stars 5.35k forks source link

Unable to Retrieve Intermediate Gradients with CogVideoXPipeline #9698

Closed lovelyczli closed 5 days ago

lovelyczli commented 2 weeks ago

Describe the bug

When generating videos using the CogVideoXPipeline model, we need to access the gradients of intermediate tensors. However, we do not require additional training or parameter updates for the model.

We tried using register_forward_hook to capture the gradients, but this approach failed because the CogVideoXPipeline disables gradient calculations. Specifically, in pipelines/cogvideo/pipeline_cogvideox.py at line 478, gradient tracking is turned off with @torch.no_grad().

How can we resolve this issue and retrieve the gradients without modifying the model’s parameters or performing extra training?

Reproduction

Sample Code pipe = CogVideoXPipeline.from_pretrained( "THUDM/CogVideoX-2b", torch_dtype=torch.float16 ) video = pipe( prompt=prompt, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, generator=torch.Generator(device="cuda").manual_seed(42), ).frames[0]

Pipeline Code Reference pipelines/cogvideo/pipeline_cogvideox.py at line 478 @torch.no_grad() @replace_example_docstring(EXAMPLE_DOC_STRING) def call( self, prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None, height: int = 480, width: int = 720,

Logs

No response

System Info

Diffusers version: 0.30.3

Who can help?

No response

a-r-r-o-w commented 2 weeks ago

The pipelines should not be used for training. They are only meant for inference purposes, so gradient tracking cannot be done unless you modify the code to suit your needs. Instead, you will have to use each modeling component and write the training loop. You can see an example of training here

lovelyczli commented 2 weeks ago

@a-r-r-o-w Thank you for your prompt reply and the training code. I noticed that the provided training code requires independent modules, including T5EncoderModel, CogVideoXTransformer3DModel, and AutoencoderKLCogVideoX.

This approach seems somewhat cumbersome, as our requirement does not involve training or updating model parameters—we only need to access the gradients.

Would simply removing the torch.no_grad() decorator from lines 478-485 in the local pipeline_cogvideox.py resolve the issue efficiently?

Thank you very much!

a-r-r-o-w commented 2 weeks ago

Yes, removing the torch.no_grad() would make it possible to access gradients. The models, by default, are in .eval() mode so layers like dropout will not take effect.

a-r-r-o-w commented 5 days ago

Hi @lovelyczli, I believe this should be answered with the above comment, so am marking this as closed. Please feel free to re-open if there's anything else we can help with