Closed inspire-boy closed 2 weeks ago
You can add --gradient_checkpointing
and --use_8bit_adam
flags. Also, you can try --gradient_accumulation_steps=8
. If it still gives OOM, you can replace bnb.optim.AdamW8bit
with bnb.optim.PagedAdamW8bit
when using --use_8bit_adam
; but this might be slower.
better to train a LoRA on 24G VRAM with this script, or use something else that does pre-processing eg. simpletuner
I also encountered this problem. In fact, the OOM was not during training, but during validation.
You can add
--gradient_checkpointing
and--use_8bit_adam
flags. Also, you can try--gradient_accumulation_steps=8
. If it still gives OOM, you can replacebnb.optim.AdamW8bit
withbnb.optim.PagedAdamW8bit
when using--use_8bit_adam
; but this might be slower.
accelerate launch train_dreambooth_sd3.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision="bf16" \ --instance_prompt="a photo of sks dog" \ --resolution=768 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --seed="42" \ --gradient_checkpointing \ --use_8bit_adam \ By using params above, also OOM. I don't know is there param else I can adjust? xformers? ^^
@inspire-boy Did you encounter OOM during training or during validation?
better to train a LoRA on 24G VRAM with this script, or use something else that does pre-processing eg. simpletuner
I use a 4090-24G..
@inspire-boy Did you encounter OOM during training or during validation?
in the begining Steps: 0%| | 0/500 [00:08<?, ?it/s]
Did you try PagedAdamW8bit?
accelerate launch train_dreambooth_lora_sd3.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-5 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --gradient_checkpointing \ --use_8bit_adam
in train_dreambooth_lora_sd3.py line 1239 : if args.optimizer.lower() == "adamw": ................
**optimizer_class = bnb.optim.PagedAdamW8bit**
else:
optimizer_class = torch.optim.AdamW
I edit this python code,but still OOM...
Did you try PagedAdamW8bit?
If it support xformers?
Currently, SD3 doesn't support xformers: https://github.com/huggingface/diffusers/issues/8535 Btw, what is your PyTorch version? Is it one of the latest versions?
Currently, SD3 doesn't support xformers: #8535 Btw, what is your PyTorch version? Is it one of the latest versions? @tolgacangoz tolgacangoz Ubuntu 22.04.4 LTS + vram/4090 + ram/42G Python 3.10.14
nvidia-smi
Sat Jun 15 02:38:33 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:1A:00.0 Off | Off |
| 30% 24C P8 23W / 450W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
I fell it happend in this output:
Generating 4 images with prompt: A photo of sks dog in a bucket.
Traceback (most recent call last):
File "/data/diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py", line 1665, in
yeah, with the validation prompt for the moment, you'll need at least 31GB of VRAM, with something like this:
accelerate launch examples/dreambooth/train_dreambooth_lora_sd3.py \
--pretrained_model_name_or_path="models/stable_diffusion_3_medium/" \
--instance_data_dir="./datasets/dog" \
--output_dir="./outputs/lora/dog/" \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 --train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--optimizer="AdamW" \
--use_8bit_adam \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--seed="42"
you can train with 21GB VRAM.
Also found that you're using --gradient_accumulation_steps=4
but not --gradient_checkpointing
. You will still get the OOM with it though.
yeah, with the validation prompt for the moment, you'll need at least 31GB of VRAM, with something like this:
accelerate launch examples/dreambooth/train_dreambooth_lora_sd3.py \ --pretrained_model_name_or_path="models/stable_diffusion_3_medium/" \ --instance_data_dir="./datasets/dog" \ --output_dir="./outputs/lora/dog/" \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --lr_scheduler="constant" \ --optimizer="AdamW" \ --use_8bit_adam \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --seed="42"
you can train with 21GB VRAM.
It'worked without validate prompts! I got "pytorch_lora_weights.safetensors". by the way, is there any way to optimize validation to reduce video memory? I tried 223 but still OOM. and Can this lora be used in the officially provided comfyui workflow? I notice Load Lora node can't find it very thankful.
got prompt lora key not loaded: transformer.transformer_blocks.0.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.0.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.1.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.10.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.11.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.12.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.13.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.14.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.15.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.16.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.17.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.18.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.19.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.2.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.20.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.21.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.22.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.23.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.3.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.4.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.5.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.6.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.7.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.8.attn.to_v.lora_B.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_k.lora_A.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_k.lora_B.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_out.0.lora_A.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_out.0.lora_B.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_q.lora_A.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_q.lora_B.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_v.lora_A.weight lora key not loaded: transformer.transformer_blocks.9.attn.to_v.lora_B.weight Requested to load SD3 Loading 1 new model 100%|███████████
import torch from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("/data/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
lora_id = "/data/diffusers/examples/dreambooth/trained-sd3/checkpoint-500" pipe.load_lora_weights(lora_id)
pipe = pipe.to("cuda")
pipe.enable_model_cpu_offload()
image = pipe( "A photo of sks dog inside a bottle containing a galaxy. The bottle printed text: SD3 Lora Dog", negative_prompt="low quality", num_inference_steps=30, guidance_scale=6.0, generator=torch.manual_seed(42) ).images[0] image
prompt: A photo of sks dog in a bucket printed text that say:SD3 Lora Dog
glad you got it working, I don't know why did you get the layers error, for me it worked from the beginning but I did use the code from some PRs that fix the loss and the gradient accumulation.
IMO the VRAM usage with validation shouldn't be that high but this is just the first iteration of the training scripts, as you can see we're still updating it with the help of the community.
Also comfyui added a commit that enables to load the diffusers SD3 lora, so it should work there too.
glad you got it working, I don't know why did you get the layers error, for me it worked from the beginning but I did use the code from some PRs that fix the loss and the gradient accumulation.
IMO the VRAM usage with validation shouldn't be that high but this is just the first iteration of the training scripts, as you can see we're still updating it with the help of the community.
Also comfyui added a commit that enables to load the diffusers SD3 lora, so it should work there too.
just ignore the errors,it's comfyui lora loader issue.I will update the code. Very thank.your work is timely and valuable.
Hey are you still able to get the script running? I keep having errors about the text encoder loading (made a separate issue.) The dreambooth training works but the Lora training just wont at all for me.
I currently have four 4090 GPUs, each with a24GB of memory. Can full-finetuning (even without using some quantization tricks, train_dreambooth_lora_sd3.py or train_dreambooth_sd3.py) be distributed to run this script?
Hey are you still able to get the script running? I keep having errors about the text encoder loading (made a separate issue.) The dreambooth training works but the Lora training just wont at all for me.
There are many causes.pls list environments and errors waiting someone known it.
Seems like the issue is solved. For text encoder training, we have this opened already: https://github.com/huggingface/diffusers/issues/8726
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 9.69 MiB is free. Process 20296 has 23.63 GiB memory in use. Of the allocated memory 22.81 GiB is allocated by PyTorch, and 362.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation
my params: accelerate launch train_dreambooth_sd3.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --seed="42" \
It make OOM. If can someone give a lowvram example(accelerate choices included),thanks!