SD3 dreambooth training - caching of embeddings and vae representations

GavChap commented 5 months ago

Is your feature request related to a problem? Please describe. The current implementation of the dreambooth trainings for both loras and finetuning is very memory intensive.

Describe the solution you'd like. I would like the option to pre-cache the VAE representations of the images and the text encoder representations so that training could be done without the text encoders being in VRAM.

Describe alternatives you've considered. I don't believe there are alternatives to reducing the VRAM requirement for training.

AmericanPresidentJimmyCarter commented 5 months ago

Supported already in SimpleTuner, which uses diffusers

https://github.com/bghira/SimpleTuner

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers

SD3 dreambooth training - caching of embeddings and vae representations #8540