Are lora's for SDXL text encoder supported?

bssrdf commented 3 months ago

Hi, I trained a SDXL lora with diffuser's --train_text_encoder option. When loaded, only unet lora's got applied and all text encoder ones are skipped.

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:171  - loading model from '../models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:729  - load ../models/sd_xl_base_1.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:182  - loading vae from '../models/sdxl_vae.safetensors'
[INFO ] model.cpp:729  - load ../models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:200  - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:406  - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:425  - loading model from '../models/sd_xl_base_1.0.safetensors' completed, taking 1.52s
[INFO ] stable-diffusion.cpp:442  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:553  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:729  - load ../models/pytorch_lora.safetensors using safetensors format
[INFO ] lora.hpp:39   - loading LoRA from '../models/pytorch_lora.safetensors'
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.k_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.out_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.0.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.k_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.out_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.1.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.k_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.out_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.10.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder.text_model.encoder.layers.11.self_attn.k_proj.lora_linear_layer.up.weight
.....
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.7.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.7.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.k_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.out_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.8.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.k_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.out_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.q_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.down.weight
[WARN ] lora.hpp:165  - unused lora tensor text_encoder_2.text_model.encoder.layers.9.self_attn.v_proj.lora_linear_layer.up.weight
[WARN ] lora.hpp:174  - Only (1120 / 1472) LoRA tensors have been applied
[INFO ] stable-diffusion.cpp:530  - lora 'pytorch_lora' applied, taking 0.38s
[INFO ] stable-diffusion.cpp:1608 - apply_loras completed, taking 0.38s
[INFO ] stable-diffusion.cpp:1719 - get_learned_condition completed, taking 106 ms
[INFO ] stable-diffusion.cpp:1735 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1739 - generating image: 1/1 - seed 1140298780
  |==================================================| 20/20 - 2.02it/s
[INFO ] stable-diffusion.cpp:1776 - sampling completed, taking 9.98s
[INFO ] stable-diffusion.cpp:1784 - generating 1 latent images completed, taking 10.01s
[INFO ] stable-diffusion.cpp:1786 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1796 - latent 1 decoded, taking 0.99s
[INFO ] stable-diffusion.cpp:1800 - decode_first_stage completed, taking 0.99s
[INFO ] stable-diffusion.cpp:1817 - txt2img completed in 11.10s
save result image to 'dreambooth07.png'

Has anyone experienced this problem?

grauho commented 3 months ago

My immediate thought looking at your output is that this is a tensor name conversion problem, SDXL LoRAs should be able to modify the text encoders but usually the prefix appears as "te2" or "te1"". I would try adding "text_encoder_2" and "text_encoder" to the convert_sdxl_lora_name function and see if that fixes your problem.

bssrdf commented 3 months ago

My immediate thought looking at your output is that this is a tensor name conversion problem, SDXL LoRAs should be able to modify the text encoders but usually the prefix appears as "te2" or "te1"". I would try adding "text_encoder_2" and "text_encoder" to the convert_sdxl_lora_name function and see if that fixes your problem.

@grauho , yes, it is a SDXL LoRA name conversion problem. With current version of diffusers, text encoder LoRA's start with text_encoder and text_encoder_2. Thanks for helping with this issue.

grauho commented 3 months ago

@bssrdf, Sure thing, so it's working for you now?

bssrdf commented 3 months ago

My immediate thought looking at your output is that this is a tensor name conversion problem, SDXL LoRAs should be able to modify the text encoders but usually the prefix appears as "te2" or "te1"". I would try adding "text_encoder_2" and "text_encoder" to the convert_sdxl_lora_name function and see if that fixes your problem.

@grauho , yes, it is a SDXL LoRA name conversion problem. With current version of diffusers, text encoder LoRA's start with text_encoder" and "text_encoder_2"

@bssrdf, Sure thing, so it's working for you now?

No, I'll wait for your fix.

grauho commented 3 months ago

My immediate thought looking at your output is that this is a tensor name conversion problem, SDXL LoRAs should be able to modify the text encoders but usually the prefix appears as "te2" or "te1"". I would try adding "text_encoder_2" and "text_encoder" to the convert_sdxl_lora_name function and see if that fixes your problem.

@grauho , yes, it is a SDXL LoRA name conversion problem. With current version of diffusers, text encoder LoRA's start with text_encoder" and "text_encoder_2"

@bssrdf, Sure thing, so it's working for you now?

No, I'll wait for your fix.

You're welcome to try the test I described, otherwise you'll be waiting a while as I do not have a LoRA with this problem to test against.

bssrdf commented 3 months ago

My immediate thought looking at your output is that this is a tensor name conversion problem, SDXL LoRAs should be able to modify the text encoders but usually the prefix appears as "te2" or "te1"". I would try adding "text_encoder_2" and "text_encoder" to the convert_sdxl_lora_name function and see if that fixes your problem.

@grauho , yes, it is a SDXL LoRA name conversion problem. With current version of diffusers, text encoder LoRA's start with text_encoder" and "text_encoder_2"

@bssrdf, Sure thing, so it's working for you now?

No, I'll wait for your fix.

You're welcome to try the test I described, otherwise you'll be waiting a while as I do not have a LoRA with this problem to test against.

@grauho, I followed your idea using convert_sdxl_lora_name to do the proper conversion. Now all LoRA (text encoder and unet) are working. I'll submit a PR later. Thanks for the help.

bssrdf commented 3 months ago

Fixed by https://github.com/leejet/stable-diffusion.cpp/pull/216

leejet / stable-diffusion.cpp

Are lora's for SDXL text encoder supported? #213