yardenfren1996 / B-LoRA

Implicit Style-Content Separation using B-LoRA
MIT License
222 stars 10 forks source link

cannot reproduce the results of the paper #3

Open Alan-Han opened 3 months ago

Alan-Han commented 3 months ago

hi, I use the image and code in the paper, but cannot reproduce the results, here is the train and infer details:

accelerate launch train_dreambooth_b-lora_sdxl.py \
    --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --instance_data_dir="/xxx/data_c" \
    --instance_prompt="A ctc" \
    --output_dir=./output/blora_c \
    --resolution=1024 \
    --rank=64 \
    --train_batch_size=1 \
    --learning_rate=5e-5 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=1000 \
    --checkpointing_steps=500 \
    --seed="0" \
    --gradient_checkpointing \
    --use_8bit_adam \
    --mixed_precision="fp16"
accelerate launch train_dreambooth_b-lora_sdxl.py \
    --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --instance_data_dir="/xxx/data_s" \
    --instance_prompt="A sks" \
    --output_dir=./output/blora_s \
    --resolution=1024 \
    --rank=64 \
    --train_batch_size=1 \
    --learning_rate=5e-5 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=1000 \
    --checkpointing_steps=500 \
    --seed="0" \
    --gradient_checkpointing \
    --use_8bit_adam \
    --mixed_precision="fp16"

data_c and data_s are the following two respectively bull wolf_plushie infer script is:

python inference.py \
    --prompt "A ctc in sks style" \
    --output_path ./output \
    --content_B_LoRA /xxx/B-LoRA/output/blora_c/checkpoint-1000 \
    --style_B_LoRA /xxx/B-LoRA/output/blora_s/checkpoint-1000 \
    --num_images_per_prompt 1

but the result is all black then I change the prompt to "A ctc made of gold",the result is still very strange: A ctc made of gold_0 is there any problem during the whole process?

yardenfren1996 commented 3 months ago

Hi, thank you for reaching out. I'll do my best since it seems like you've done everything correctly.

I recently made a small correction regarding training with 'fp16' precision, and it involves using another VAE in SDXL as described here link. I believe this will solve the issue with the blacked-out output.

However, the results for "A ctc made of gold" still appear strange. Firstly, I recommend trying to use our notebook notebook for inference to see if it works out.

If that doesn't resolve the issue, please try uploading your B-LoRA weights here: https://huggingface.co/lora-library, and I'll be able to see what's happening.

By the way, during inference, try using /xxx/B-LoRA/output/blora_c instead of /xxx/B-LoRA/output/blora_c/checkpoint-1000. Although I don't believe there's much difference, this is the way I typically use it.

Please let me know if it works out. Thank you.

Alan-Han commented 3 months ago

Thank you for your responce! I made two changes according to your advice. First, I add new vae to train script:

accelerate launch train_dreambooth_b-lora_sdxl.py \
    --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
    --pretrained_vae_model_name_or_path madebyollin/sdxl-vae-fp16-fix \
    --instance_data_dir="/xxx/data_c" \
    --instance_prompt="A ctc" \
    --output_dir=./output/blora_c \
    --resolution=1024 \
    --rank=64 \
    --train_batch_size=1 \
    --learning_rate=5e-5 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=1000 \
    --checkpointing_steps=500 \
    --seed="0" \
    --gradient_checkpointing \
    --use_8bit_adam \
    --mixed_precision="fp16"

Second, I use the notebook for inference.The result of content blora alone seems right prompt='A ctc made of gold' A ctc made of gold_3 However the result of the two lora merged is still strange(although not all black) prompt='A ctc in sks style' A ctc in sks style_3 I upload the both blora weights here: https://huggingface.co/lora-library/B-lora-alanyhan-content https://huggingface.co/lora-library/B-lora-alanyhan-style

Nieschlafen commented 2 months ago

I have the same issue,the result of the two lora merged is strange,Have you solved this problem

yardenfren1996 commented 2 months ago

Hi, sorry for the delay. I checked it as well, and I'm encountering the same blurry results as you are. The only difference is in your training, as you specify the new VAE (madebyollin/sdxl-vae-fp16-fix). Please try running the original training without changing the VAE. Then, for inference (on Colab), use the new VAE. Let me know if the issue persists.

yatoubusha commented 1 week ago

is there any difference between the [inference.py] and the [B_LORA_inference.ipynb]? why the author can get different result?

FerryHuang commented 4 days ago

It seems that the training script still gets wrong and the trained checkpoint cannot be inferenced correctly under the fixed-vae. The official loras performs good though. :)