Poor quality results - Githubissues

unrue commented 1 year ago

Hi,

I'm using this manual on 1 GPU V100:

https://huggingface.co/docs/diffusers/main/en/training/dreambooth#performing-inference-using-a-saved-checkpoint

using dog dataset and finetuning the text encoder. "A dog in a bucket" inference case, retrieve very low quality results, very far from the expected result. Any idea why? I used same training parameters.

dog-bucket

My configurations:

Pytorch 1.10.0 Diffusers 0.22.0.dev0 Accelerate 0.20.3

Thanks.

alilouTina commented 1 year ago

Hi @unrue, i have the same problem from Yesterday. I ran the same command on Google colab, same parameters and images than 2 days ago, but got very poor and quality of generated images. No evolution when changing params and images.

Cuda 12 Diffusers 0.21.4 with autotrain-advenced@main Stable diffusion xl 1.0

sayakpaul commented 1 year ago

See https://github.com/huggingface/diffusers/issues/5004#issuecomment-1780909598

unrue commented 1 year ago

Hi,

I don't understand how you solved. I don't use train_dreambooth_lora_sdxl.py

sayakpaul commented 1 year ago

If you look into the Colab Notebook, that should help answer the question. Let me know if that doesn't.

unrue commented 1 year ago

I'm trying your Colab, but I have to insert my hugging face token? Is there other way? I don't want insert my token in a Colab.

sayakpaul commented 1 year ago

You can disable push_to_hub. HF token is needed to sync with the HF Hub.

unrue commented 1 year ago

Ok,

from your Colab I'm experiencing an out of memory:

`  File "/content/diffusers/examples/dreambooth/diffusers/src/diffusers/models/resnet.py", line 755, in forward
    output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 598.81 MiB is free. Process 292077 has 14.16 GiB memory in use. Of the allocated memory 12.34 GiB is allocated by PyTorch, and 522.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps:   0% 2/500 [04:59<20:42:14, 149.67s/it, loss=0.331, lr=0.0001]`

Trying in local, having a GPU V100 with 32 gb of VRAM, I get out of memory too:

  File */image2image_env/diffusers/src/diffusers/models/attention_processor.py", line 743, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "*/image2image_env/diffusers/src/diffusers/models/attention_processor.py", line 598, in get_attention_scores
    attention_scores = torch.baddbmm(
RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 31.75 GiB total capacity; 27.43 GiB already allocated; 958.50 MiB free; 29.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried removing --gradient_accumulation_steps=2 --gradient_checkpointing and a lower resolution with no effect. How is that possible that 32 Gb of Vram are not enough? (Vram is available, i checked)

Other question, during training I see:

Generating 4 images with prompt: A photo of sks dog in a bucket.

Where such images are saved in order to display during training phase? I don't see any image saved.

sayakpaul commented 1 year ago

Sorry, but I am unable to reproduce the issue you're facing.

I reran the Colab Notebook (https://colab.research.google.com/gist/sayakpaul/13864eb0427bef50f5e95f08b60a03a3/scratchpad.ipynb) and didn't have to change any of the command-line arguments:

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path={MODEL_NAME}  \
  --instance_data_dir={INSTANCE_DIR} \
  --pretrained_vae_model_name_or_path={VAE_PATH} \
  --output_dir={OUTPUT_DIR} \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=2 \
  --gradient_accumulation_steps=2 \
  --gradient_checkpointing \
  --use_8bit_adam \
  --learning_rate=1e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --checkpointing_steps=717 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

Since you don't want to use the HF Hub, you can omit push_to_hub.

I tried removing --gradient_accumulation_steps=2 --gradient_checkpointing and a lower resolution with no effect. How is that possible that 32 Gb of Vram are not enough? (Vram is available, i checked)

I don't have any reasonable explanation for this, sadly, other than checking in on your dev setup (drivers, versions, etc.).

Where such images are saved in order to display during training phase? I don't see any image saved.

The images are logged to Weights and Biases or TensorBoard: https://github.com/huggingface/diffusers/blob/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/dreambooth/train_dreambooth_lora_sdxl.py#L1341

Local saving is only done here: https://github.com/huggingface/diffusers/blob/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/dreambooth/train_dreambooth_lora_sdxl.py#L71

patrickvonplaten commented 1 year ago

@unrue , a gentle reminder that it's important to contribute precise issues. An issue with the title "Poor quality results" is very much not explicit and doesn't help anybody when rediscovering this issue.

It would be amazing if you could take a look at https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers

Poor quality results #5560