huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.07k stars 5.37k forks source link

StableDiffusionXL not working for Dreambooth #4117

Closed mauricio-repetto closed 1 year ago

mauricio-repetto commented 1 year ago

Describe the bug

Hi,

Is this model available for dreambooth? I'm not sure wether to put this as an issue or a feature request, but I've tried to run the regular script for the XL model to see how good it is in comparison with the other sd versions but I'm getting an error :'(

I know that XL's architecture may be different from previous SDs so maybe the script is not prepared to use it yet.

Thanks!

Reproduction

!accelerate launch train_dreambooth.py \ --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-0.9" \ --instance_data_dir="./pneumoconiosis" \ --class_data_dir="./data/xray" \ --output_dir="./diffusion/pneumoconiosis/models/stable-diffusion-xl-pneumoconiosis-finetuned" \ --train_text_encoder \ --mixed_precision="fp16" \ --with_prior_preservation --prior_loss_weight=1.0 \ --instance_prompt="image of a pneumoconiosis xray" \ --class_prompt="image of a xray" \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=2 --gradient_checkpointing \ --use_8bit_adam \ --learning_rate=5e-6 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --num_class_images=200 \ --max_train_steps=11400 \ --checkpointing_steps=4000 \ --num_validation_images=4 \ --report_to="wandb" \ --seed=1337

Logs

2023-07-16 01:08:42.735258: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-07-16 01:08:48.477573: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
07/16/2023 01:08:51 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Keyword arguments {'safety_checker': None} are not expected by StableDiffusionXLPipeline and will be ignored.
Loading pipeline components...:   0% 0/7 [00:00<?, ?it/s]Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...:  14% 1/7 [00:01<00:09,  1.61s/it]{'force_upcast'} was not found in config. Values will be initialized to default values.
Loaded vae as AutoencoderKL from `vae` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...:  29% 2/7 [00:01<00:04,  1.24it/s]Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...:  43% 3/7 [00:02<00:02,  1.47it/s]Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...:  71% 5/7 [00:02<00:00,  3.02it/s]Loaded unet as UNet2DConditionModel from `unet` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...:  86% 6/7 [00:07<00:01,  1.72s/it]Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-xl-base-0.9.
Loading pipeline components...: 100% 7/7 [00:07<00:00,  1.10s/it]
07/16/2023 01:08:59 - INFO - __main__ - Number of class images to sample: 200.
Generating class images: 100% 50/50 [19:51<00:00, 23.83s/it]
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'variance_type'} was not found in config. Values will be initialized to default values.
{'force_upcast'} was not found in config. Values will be initialized to default values.
wandb: Currently logged in as: amd-repetto (only-my-team). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in ./wandb/run-20230716_012903-r6j5zx8r
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run azure-shadow-26
wandb: ⭐️ View project at https://wandb.ai/only-my-team/dreambooth
wandb: 🚀 View run at https://wandb.ai/only-my-team/dreambooth/runs/r6j5zx8r
07/16/2023 01:29:03 - INFO - __main__ - ***** Running training *****
07/16/2023 01:29:03 - INFO - __main__ -   Num examples = 200
07/16/2023 01:29:03 - INFO - __main__ -   Num batches each epoch = 200
07/16/2023 01:29:03 - INFO - __main__ -   Num Epochs = 114
07/16/2023 01:29:03 - INFO - __main__ -   Instantaneous batch size per device = 1
07/16/2023 01:29:03 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 2
07/16/2023 01:29:03 - INFO - __main__ -   Gradient Accumulation steps = 2
07/16/2023 01:29:03 - INFO - __main__ -   Total optimization steps = 11400
Steps:   0% 0/11400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/train_dreambooth.py", line 1375, in <module>
    main(args)
  File "/content/train_dreambooth.py", line 1223, in main
    model_pred = unet(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 581, in forward
    return model_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 569, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_condition.py", line 839, in forward
    if "text_embeds" not in added_cond_kwargs:
TypeError: argument of type 'NoneType' is not iterable
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run azure-shadow-26 at: https://wandb.ai/only-my-team/dreambooth/runs/r6j5zx8r
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230716_012903-r6j5zx8r/logs
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-0.9', '--instance_data_dir=/content/pneumoconiosis_resized/train/1', '--class_data_dir=/content/data/xray', '--output_dir=/content/drive/MyDrive/ORT/Master/Codes/diffusion/pneumoconiosis/models/stable-diffusion-xl-pneumoconiosis-finetuned', '--train_text_encoder', '--mixed_precision=fp16', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=image of a pneumoconiosis xray', '--class_prompt=image of a xray', '--resolution=768', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=11400', '--checkpointing_steps=4000', '--num_validation_images=4', '--report_to=wandb', '--seed=1337']' returned non-zero exit status 1.

System Info

Who can help?

@patrickvonplaten @sayakpaul @williamberman

BitPhinix commented 1 year ago

Training the text encoder is not yet supported, try removing the „—train-text-encoder“.

BitPhinix commented 1 year ago

Oh and you have to use the special sdxl training scripts as the architecture differs slightly between xl and previous versions

mauricio-repetto commented 1 year ago

Thanks @BitPhinix! I did not realize about the sdxl, and yes I was affraid of an issue regarding the architectural differences.. but I'm seeing that's only available for lora, is it because of a matter of the computational power needed to run a normal dreambooth finetuning? like we won't give you a regular script for xl because you, simple mortal, won't stand a chance to run it anyways :P

sayakpaul commented 1 year ago

Hi,

Yeah we decided to only add support DreamBooth LoRA for SDXL because of the computational constraints that the community might face and we empathize with that.

Our instructions are here: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md

We are also adding support for training the text encoder: https://github.com/huggingface/diffusers/pull/4097

mauricio-repetto commented 1 year ago

Awesome, thanks paul.

genesiscz commented 1 year ago

Our instructions are here: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md I am so sorry for kinda hijacking, but I don't want to create a separate issue as I am sure it's something pretty straightforward. If I train SDXL LoRa using train_dreambooth_lora_sdxl.py and it outputs a bin file, how are you supposed to transform it to a .safetensors format so I can load it just like pipe.load_lora_weights("./loras", weight_name="lora.safetensors") ? Is there a script somewhere I and I missed it?

Also, is such LoRa from dreambooth supposed to work in ComfyUI?

Also, what "-style" LoRAs does the dreambooth training create?

mauricio-repetto commented 1 year ago

Hi @genesiscz, why are you trying to transform it while using diffusers? just use the file directly

...
lora_model_id = "/content/bin_folder/"  # path to your .bin file's folder.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"

pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_model_id)
...

I don't use ComfyUI so I cannot answer that question.

Regarding the style I think it should be mostly based on the prompt that you use, remember that with dreambooth you are teaching a new concept to the model without retraining everything from scratch (and with loRa you do it in a very efficient way if you are low in resources), so it's just another regular DM with additional concepts. And with XL you also have the refiner model which can also give you a few additional improvements (or not).

Hope it helps!

genesiscz commented 1 year ago

You're actually right, I can do that. But in order to be able to place it to A1111 or ComfyUI, they need it in safetensors format.

I meant style such as Kohya, Loha, Lycoris, I guess it's a different algorithm/structure of the LoRa (I am really a newbie in this area). I want to know what "style" or whatever is that called this Lora that are being trained with diffusers is, because I want to properly ask on ComfyUI's.

Thanks!

mauricio-repetto commented 1 year ago

Oh I see it now, yes in that case I guess there should be a workaround for that but I don't know it.

Regarding the "style" (I think it is more of an implementation), yes, those are different approaches based on LoRa, in this case I'm not into the details but I would guess that it's not far from the original paper.

sayakpaul commented 1 year ago

Hi.

I prepared a Colab Notebook here that shows how to convert a .bin file .safetensors format.

sayakpaul commented 1 year ago

I meant style such as Kohya, Loha, Lycoris, I guess it's a different algorithm/structure of the LoRa (I am really a newbie in this area). I want to know what "style" or whatever is that called this Lora that are being trained with diffusers is, because I want to properly ask on ComfyUI's.

LyCORIS is a repository that provides adapter utilities beyond convention methods as far as I know. LoCon is LoRA with Convs. LoHA is a separate method.

Kohya provides useful trainers for training adapter-based models for Stable Diffusion (again as far as I know).

By LyCORIS or Kohya "styled" checkpoints, I essentially mean that the checkpoints are formatted a little differently from what we have in diffusers. I hope this clarifies your doubt.