huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
24.16k stars 4.98k forks source link

SD3 and Gradient checkpointing gives error and crashes #8503

Open bluvoll opened 1 month ago

bluvoll commented 1 month ago

Describe the bug

Describe the bug

Activating --gradient_checkpointing in either Lora or DB scripts for SD3 causes: TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, not tuple, which crashes the run, without it, LoRA runs fine at about 20GB vram usage batch size 1 with AdamW8bit

imagen

Reproduction

Add --gradient_checkpointing to training parameters.

Logs

No response

System Info

Who can help?

No response

bghira commented 1 month ago

i wish i'd looked sooner, haha. i was hunting this one down.

bghira commented 1 month ago

@sayakpaul @DN6 i can confirm this one

rockerBOO commented 1 month ago

Can confirm with --gradient_checkpointing this error happens. With the LoRA training.

diffusers 0.29.0

Carolinabanana commented 1 month ago

I have fixed this here: https://github.com/huggingface/diffusers/pull/8542

DN6 commented 2 weeks ago

Since #8542 was merged, can we close this?