huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.3k stars 5.24k forks source link

LoRA from train_dreambooth_lora_sdxl.py is not working in A1111 anymore #6894

Closed patryk-bartkowiak-nitid closed 5 months ago

patryk-bartkowiak-nitid commented 7 months ago

Describe the bug

I have been using train_dreambooth_lora_sdxl.py and convert_diffusers_sdxl_lora_to_webui.py to train LoRA for specific character, It was working till like a week ago. I am using the same baseline model and the same data.

I realized that previous size of all the LoRA files had 29967176 bytes, now it has 29889672 and less keys in dict after I load it as pure .safetensors file.

I realized that it works fine with inference guide in README:

import torch
from diffusers import DiffusionPipeline

pretrained_model = "./pretrained_models/dreamshaper-xl"
lora_weights = "./outputs/dreamshaper-xl_claire/checkpoint-2000/"

prompt = "photo of wff woman, sitting in train"
negative_prompt = "text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated"

pipe = DiffusionPipeline.from_pretrained(pretrained_model, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_weights)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    seed=420,
).images[0]

image.save("lora_inference.png")

But after I convert and load to A1111 (it loads correctly) it doesnt work anymore, looks like its adding some noise to the output only.

I already tried checkpointing to previous commits on diffusers, torch and torchvision, but nothing really helps. I am still not able to use LoRA in A1111.

Reproduction

Code to train LoRA:

export MODEL_NAME="pretrained_models/dreamshaper-xl"
export INSTANCE_DIR="data/claire"
export MAX_TRAIN_STEPS=5000
export CHECKPOINTING_STEPS=500

export OUTPUT_DIR="outputs/$(basename ${MODEL_NAME})_$(basename ${INSTANCE_DIR})_tmp"
export CUDA_LAUNCH_BLOCKING=1
export TORCH_USE_CUDA_DSA=1

printf "\n\nTraining Claire model with $MODEL_NAME on $INSTANCE_DIR, saving to $OUTPUT_DIR\n\n"

accelerate launch diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
    --instance_prompt="photo of wff woman, isolated on white background" \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --instance_data_dir=$INSTANCE_DIR \
    --output_dir=$OUTPUT_DIR \
    --resolution=1024 \
    --train_batch_size=2 \
    --gradient_accumulation_steps=4 \
    --learning_rate=1e-4 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=$MAX_TRAIN_STEPS \
    --seed="0" \
    --train_text_encoder \
    --enable_xformers_memory_efficient_attention \
    --gradient_checkpointing \
    --use_8bit_adam \
    --checkpointing_steps=$CHECKPOINTING_STEPS

Code to convert to A1111 format

python /project/diffusers/scripts/convert_diffusers_sdxl_lora_to_webui.py {input_path} {output_path}

Logs

Can't really post any errors, looks like typical image generation, no errors or warning during training and conversion

System Info

- `diffusers` version: 0.26.0.dev0
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.27
- Python version: 3.10.9
- PyTorch version (GPU?): 2.0.0 (True)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.37.2
- Accelerate version: 0.26.1
- xFormers version: 0.0.19
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@yiyixuxu @sayakpaul @DN6 @patrickvonplaten

sayakpaul commented 7 months ago

Thanks for the detailed thread. Can you pin me a version that was working as expected for you?

I am asking because none of those scripts went through significant logical changes in the past 7 days.

patryk-bartkowiak-nitid commented 7 months ago

Yeah that's the thing, I am unable to restore the environment perfectly and I'm blocked right now, not sure where the issue is :/

sayakpaul commented 7 months ago

Ah then it's a bit of a pity. In any case, please do ping me here if you're able to give me a pinpointed version. I am happy to look further from there :-)

patryk-bartkowiak-nitid commented 7 months ago

Anyway going through README guide it's not working properly, I am happy to meet or whatever to solve this issue :)

sayakpaul commented 7 months ago

README guide? Do you mean the commands from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md don't work? Can you provide a fully reproducible snippet for me?

I am happy to meet or whatever to solve this issue :)

Sorry, we cannot do that. As maintainers, we need to be cognizant of our time and keep the discussions as open as possible,

patryk-bartkowiak-nitid commented 7 months ago

I mean command from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md combined with https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py

Not sure on what part of the pipeline there is an issue, like I said I am able to use LoRA using code for inference that you provided in README, but can't correctly convert it. Might be both the conversion itself or LoRA has some different properties that conversion script can't handle.

Let me send you full pipeline for you to reproduce the issue, I will try to include as many details as possible:

  1. Create VM with this docker image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
  2. Install dependencies:
    
    apt update
    apt install vim git tmux ffmpeg libsm6 libxext6 wget python3 python3-venv libgl1 libglib2.0-0 google-perftools -y

git clone https://github.com/huggingface/diffusers.git cd diffusers pip install -e . cd examples/dreambooth pip install -r requirements.txt accelerate config default pip install bitsandbytes xformers==0.0.19

3. Download baseline SDXL model:

wget https://civitai.com/api/download/models/333449 -O DreamShaperXL.safetensors

4. Convert `.safetensors` to suitable format using python:

import diffusers pipe = diffusers.StableDiffusionXLPipeline.from_single_file("DreamShaperXL.safetensors") pipe.save_pretrained("DreamShaperXL")

5. Train LoRA (6 images with the same woman on white background):

export MODEL_NAME="DreamShaperXL" export INSTANCE_DIR="data/claire" export MAX_TRAIN_STEPS=5000 export CHECKPOINTING_STEPS=500

export OUTPUT_DIR="outputs/$(basename ${MODELNAME})$(basename ${INSTANCE_DIR})" export CUDA_LAUNCH_BLOCKING=1 export TORCH_USE_CUDA_DSA=1

printf "\n\nTraining Claire model with $MODEL_NAME on $INSTANCE_DIR, saving to $OUTPUT_DIR\n\n"

accelerate launch diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \ --instance_prompt="photo of wff woman, isolated on white background" \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --resolution=1024 \ --train_batch_size=2 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=$MAX_TRAIN_STEPS \ --seed="0" \ --train_text_encoder \ --enable_xformers_memory_efficient_attention \ --gradient_checkpointing \ --use_8bit_adam \ --checkpointing_steps=$CHECKPOINTING_STEPS

6. Convert to Kohya format:

python /diffusers/scripts/convert_diffusers_sdxl_lora_to_webui.py outputs/DreamShaperXL_claire/pytorch_lora_weights.safetensors test.safetensors

7. Move to A1111:

mv test.safetensors stable-diffusion-webui/models/Lora/

sayakpaul commented 7 months ago

As mentioned I need to know a version that was working as expected for you.

CC: @linoytsaban @apolinario here.

patryk-bartkowiak-nitid commented 7 months ago

Well because I can't really provide it - can we just focus on the current version that is probably not working properly?

I was also considering A1111 to not work, but I am able to work with my previous LoRA's so I think it has to be something in this pipeline

sayakpaul commented 7 months ago

That makes it thousand times more difficult for us to make progress here actually, hence I am a bit adamant on it. To be able to pinpoint the issue -- can we say the trained LoRA provides expected results when the inference is done from diffusers?

Your initial issue description suggests so. So, I quite suspect that it's the conversion script that's the culprit here.

patryk-bartkowiak-nitid commented 7 months ago

Yes, LoRA provides expected results when the inference is done from diffusers.

When it's done in A1111 it actually changes the output image (same seed), but not in a way that it should, looks like its just adding some noise at the beginning of the generation process. I will send an example in 3 minutes

sayakpaul commented 7 months ago

Then it's quite likely that the conversion script is the problem as mentioned. So, I will let @apolinario and @linoytsaban comment further (as they are the developers of that script).

patryk-bartkowiak-nitid commented 7 months ago

A1111 Config:

photo of wff woman, rides gondola in Venice,
Negative prompt: text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated, BadDream, UnrealisticDream
Steps: 7, Sampler: DPM++ SDE Karras, CFG scale: 2, Seed: 420, Size: 1024x1024, Model hash: 676f0d60c8, Model: DreamShaperXL, Version: v1.7.0

Image without any LoRA: image Image with previously trained LoRA that works - trained for 8000 iterations with batch_size=1: image Image with new LoRA - trained for 4000 iterations with batch_size=2: image

patryk-bartkowiak-nitid commented 7 months ago

Also adding an image generated locally with new LoRA that doesn't work in A1111 - trained for 4000 iterations with batch_size=2

Code to generate:

import torch
from diffusers import DiffusionPipeline

pretrained_model = "DreamShaperXL"
lora_weights = "./outputs/DreamShaperXL_claire/checkpoint-4000/"

prompt = "photo of wff woman, rides gondola in Venice,"
negative_prompt = "text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated"

pipe = DiffusionPipeline.from_pretrained(pretrained_model, torch_dtype=torch.float32)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_weights)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    seed=420,
).images[0]

image.save("lora_inference.png")

Image:

image

Note

As you can see it's much closer, of course quality is not good enough because in AUTOMATIC1111 there are some additional things that make it look better like negative embeddings etc.

patryk-bartkowiak-nitid commented 7 months ago

Update

I tried to load exact same model after the conversion in ComfyUI and it works properly, but I found this issue from a week ago: https://github.com/huggingface/diffusers/issues/6777

Do you think it's related? Did any of LoRA keys changed? Looks like A1111 do not support it yet

sayakpaul commented 7 months ago

Could be related but the LoRA keys didn’t change. We have got multiple tests ensuring that.

linoytsaban commented 7 months ago

Hey @patryk-bartkowiak-nitid, thanks for creating this issue! Just to make sure I understand, right now comfyUI conversion works fine but A111 doesn't?

patryk-bartkowiak-nitid commented 7 months ago

Hey @patryk-bartkowiak-nitid, thanks for creating this issue! Just to make sure I understand, right now comfyUI conversion works fine but A111 doesn't?

Exactly

linoytsaban commented 7 months ago

Hmm, I'm not sure what have caused this since we haven't made any changes to the conversion script, and the changes made to the training script should not affect that. @sayakpaul was there any change in the peft keys maybe that would make the conversion script incompatible?

sayakpaul commented 7 months ago

No, I don’t think so. There were no changes to the training script or the underlying utils that would lead to key incompatibilities.

patryk-bartkowiak-nitid commented 7 months ago

Could this have had an impact? https://github.com/huggingface/diffusers/pull/6895

sayakpaul commented 7 months ago

Pretty sure not as it only touches the model card which has nothing to do with the state dict.

patryk-bartkowiak-nitid commented 7 months ago

Any ideas @sayakpaul @linoytsaban ? Still trying to figure this out

sayakpaul commented 7 months ago

Sorry but I don't work with A1111 or ComfyUI either. And I cannot offer any help related to conversion to non-diffusers formats right now.

linoytsaban commented 7 months ago

@patryk-bartkowiak-nitid can you check the state_dict of the previous Loras that worked fine on A1111 and the new ones and see if there are differences (assuming there are if it's incompatible) and what are they?

patryk-bartkowiak-nitid commented 7 months ago

I compared converted .safetensors files and already worked on restoring the exact same structure, this is how I restored it so you can see the difference between them:

before = load_file("claire.safetensors")
after = load_file("test.safetensors")

for k in after.keys():
    v = after[k]

    del after[k]

    k = k.replace("lora.down", "lora_down")
    k = k.replace("lora.up", "lora_up")
    k = k.replace("to_k_lora", "to_k.lora")
    k = k.replace("_lora_down", ".lora_down")
    k = k.replace("_lora_up", ".lora_up")

    after[k] = v

for layer_name in [x for x in after.keys() if x.endswith("lora_up.weight")]:
    layer_name = layer_name.replace("lora_up.weight", "alpha")
    layer_name = layer_name.replace("_alpha", ".alpha")
    after[layer_name] = torch.tensor(4)

Now I got two .safetensors files with exact same keys and shapes, but different values in weights ofc

patryk-bartkowiak-nitid commented 7 months ago

Before:

intersection = set(before.keys()) & set(after.keys())

len(before), len(after), len(intersection)
(2208, 1648, 528)

After

intersection = set(before.keys()) & set(after.keys())

len(before), len(after), len(intersection)
(2208, 2208, 2208)
qwerdf4 commented 6 months ago

I also encountered the same problem

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 6 months ago

@sayakpaul is this the fix? https://github.com/huggingface/diffusers/pull/7435

sayakpaul commented 6 months ago

Yeah could be.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 5 months ago

assuming fixed in https://github.com/huggingface/diffusers/pull/7435 let us know if it is still an issue, and we will reopen this!