SD3 and Gradient checkpointing gives error and crashes

bluvoll commented 1 month ago

Describe the bug

Describe the bug

Activating --gradient_checkpointing in either Lora or DB scripts for SD3 causes: TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, not tuple, which crashes the run, without it, LoRA runs fine at about 20GB vram usage batch size 1 with AdamW8bit

imagen

Reproduction

Add --gradient_checkpointing to training parameters.

Logs

No response

System Info

🤗 Diffusers version: 0.29.0.dev0
Platform: Windows-10-10.0.19045-SP0
Running on a notebook?: No
Running on Google Colab?: No
Python version: 3.10.11
PyTorch version (GPU?): 2.2.1+cu118 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.23.3
Transformers version: 4.41.2
Accelerate version: 0.31.0
PEFT version: 0.11.1
Bitsandbytes version: 0.43.0
Safetensors version: 0.4.2
xFormers version: not installed
Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB NVIDIA GeForce RTX 4090, 24564 MiB VRAM
Using GPU in script?: RTX 4090
Using distributed or parallel set-up in script?: No DDP or similar parallel setups.