Closed dora-lemon closed 1 year ago
I solved the problem by using 8 bit adam, is there any side effect?
I am following the official tutorial.
It mentions "Diffusers now provides a LoRA fine-tuning script that can run in as low as 11 GB of GPU RAM without resorting to tricks such as 8-bit optimizers".
I have an RTX 3080 16 GB card, I use the default settings just like in the tutorial, batch size of 1, fp 16, 4 validation images. When the validation loop occurs I get CUDA OOM. I see in the script that during the validation the model which is under training is kept in GPU memory, and at the same time the script tries to load a new pipeline.
I am wondering if the script fails with a 16 GB then how is it possible to train with the mentioned 11 GB.
Does anyone have a solution for this?
You're using the LoRA variant, right?
But the snippet you provided in the description is train_text_to_image.py
, which is a non-LoRA. Just ensuring that was not by mistake.
I have an RTX 3080 16 GB card
I just tested it on a Tesla T4 and it worked. Could you provide a snapshot of what you get after running diffusers-cli env
?
The script I referred to is train_text_to_image_lora.py , referenced in the Huggingface LORA tutorial.
Btw I managed to solve the OOM issue with xformers...
Okay then. The reason why I mentioned is that you stated:
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
...
And not train_text_to_image_lora.py
.
@sayakpaul thanks for your comment, but @hkristof03 just put his problem on my issue, maybe the same topic, he did not use my script list above
@dora-lemon I met the same problem with you. Would you kindly tell me how to change adam to 8 bitοΌ
@sayakpaul thanks for your comment, but @hkristof03 just put his problem on my issue, maybe the same topic, he did not use my script list above
My reply still remains the same, though. Did you try the train_text_to_image_lora.py
script and not the train_text_to_image.py
script?
If the LoRA script is failing, then please consider enabling xformers with --enable_xformers_memory_efficient_attention
. Know more here:
https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-xformers
Does this help?
@kaimingd just add --use_8bit_adam
@dora-lemon Thanks a lot. It's really worked!
Closing the issue then :)
Please feel free to reopen in case of problems.
this is my bash script, i just add enable_xformers_memory_efficient_attention compared to the original
as said in doc, it should be possible to fintune in a single 24GB 3090, but i still get cuda out of memory