Closed unrue closed 11 months ago
Hi @unrue, i have the same problem from Yesterday. I ran the same command on Google colab, same parameters and images than 2 days ago, but got very poor and quality of generated images. No evolution when changing params and images.
Cuda 12 Diffusers 0.21.4 with autotrain-advenced@main Stable diffusion xl 1.0
Hi,
I don't understand how you solved. I don't use train_dreambooth_lora_sdxl.py
If you look into the Colab Notebook, that should help answer the question. Let me know if that doesn't.
I'm trying your Colab, but I have to insert my hugging face token? Is there other way? I don't want insert my token in a Colab.
You can disable push_to_hub. HF token is needed to sync with the HF Hub.
Ok,
from your Colab I'm experiencing an out of memory:
` File "/content/diffusers/examples/dreambooth/diffusers/src/diffusers/models/resnet.py", line 755, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 598.81 MiB is free. Process 292077 has 14.16 GiB memory in use. Of the allocated memory 12.34 GiB is allocated by PyTorch, and 522.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps: 0% 2/500 [04:59<20:42:14, 149.67s/it, loss=0.331, lr=0.0001]`
Trying in local, having a GPU V100 with 32 gb of VRAM, I get out of memory too:
File */image2image_env/diffusers/src/diffusers/models/attention_processor.py", line 743, in __call__
attention_probs = attn.get_attention_scores(query, key, attention_mask)
File "*/image2image_env/diffusers/src/diffusers/models/attention_processor.py", line 598, in get_attention_scores
attention_scores = torch.baddbmm(
RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 31.75 GiB total capacity; 27.43 GiB already allocated; 958.50 MiB free; 29.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried removing --gradient_accumulation_steps=2 --gradient_checkpointing and a lower resolution with no effect. How is that possible that 32 Gb of Vram are not enough? (Vram is available, i checked)
Other question, during training I see:
Generating 4 images with prompt: A photo of sks dog in a bucket.
Where such images are saved in order to display during training phase? I don't see any image saved.
Sorry, but I am unable to reproduce the issue you're facing.
I reran the Colab Notebook (https://colab.research.google.com/gist/sayakpaul/13864eb0427bef50f5e95f08b60a03a3/scratchpad.ipynb) and didn't have to change any of the command-line arguments:
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path={MODEL_NAME} \
--instance_data_dir={INSTANCE_DIR} \
--pretrained_vae_model_name_or_path={VAE_PATH} \
--output_dir={OUTPUT_DIR} \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=2 \
--gradient_accumulation_steps=2 \
--gradient_checkpointing \
--use_8bit_adam \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--checkpointing_steps=717 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
Since you don't want to use the HF Hub, you can omit push_to_hub
.
I tried removing --gradient_accumulation_steps=2 --gradient_checkpointing and a lower resolution with no effect. How is that possible that 32 Gb of Vram are not enough? (Vram is available, i checked)
I don't have any reasonable explanation for this, sadly, other than checking in on your dev setup (drivers, versions, etc.).
Where such images are saved in order to display during training phase? I don't see any image saved.
The images are logged to Weights and Biases or TensorBoard: https://github.com/huggingface/diffusers/blob/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1341
Local saving is only done here: https://github.com/huggingface/diffusers/blob/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/dreambooth/train_dreambooth_lora_sdxl.py#L71
@unrue , a gentle reminder that it's important to contribute precise issues. An issue with the title "Poor quality results" is very much not explicit and doesn't help anybody when rediscovering this issue.
It would be amazing if you could take a look at https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi,
I'm using this manual on 1 GPU V100:
https://huggingface.co/docs/diffusers/main/en/training/dreambooth#performing-inference-using-a-saved-checkpoint
using dog dataset and finetuning the text encoder. "A dog in a bucket" inference case, retrieve very low quality results, very far from the expected result. Any idea why? I used same training parameters.
My configurations:
Pytorch 1.10.0 Diffusers 0.22.0.dev0 Accelerate 0.20.3
Thanks.