Open riflemanl opened 1 week ago
I don't think that the flux dreambooth training scripts are memory-optimized out of the box. You could try using it with DeepSpeed and enabling gradient checkpointing, which should lower the memory requirements by a lot. For serious training experiments, we recommend using something like SimpleTuner which uses diffusers as a backend and supports many important training related components easily and is memory-efficient.
Describe the bug
Followed the guide examples/dreambooth/README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB.
Reproduction
PC got 256GB RAM 3090Ti VRAM 24GB torch 2.4.1 + cuda 12.1 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True accelerate==1.0.1 transformers==4.45.2
Logs
System Info
Diffusers version is latest main branch code today, 2024-10-21, coz previous release tag still not yet support dreambooth Flux Lora training.
Who can help?
No response