Open zhongwei opened 1 year ago
Beside LCM being available for XL models, stability.ai released SDXL-turbo a destil(finetune?) that can generate good images with a single step.
is is compatible with this repo?
@leejet this can be closed
@zhongwei Support for SDXL has been added. You can try pulling the latest code from the master branch.
@leejet this can be closed
Generally, I don't proactively close issues unless they've been resolved for an extended period without any response from the person who opened the issue. I prefer the individuals who opened the issue to confirm its resolution and close it themselves.
Did anyone try running sd_xl ? For some reason its generating a empty image (its pitch black) .. Following is the command i used and its output
$ ./bin/sd -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors -p "a lovely cat"
[INFO] stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO] model.cpp:638 - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO] stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO] stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[INFO] stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO] stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 1.78s
[INFO] stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO] stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1547 ms
[INFO] stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO] stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
|==================================================| 20/20 - 18.15s/it
[INFO] stable-diffusion.cpp:6551 - sampling completed, taking 353.73s
[INFO] stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.89s
[INFO] stable-diffusion.cpp:6561 - decoding 1 latents
[INFO] stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.36s
[INFO] stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.36s
[INFO] stable-diffusion.cpp:6590 - txt2img completed in 372.80s
[INFO] main.cpp:538 - save result image to 'output.png'
i also tried downloading the unets/vae etc.. and passing the same as argument (along with some minor code changes to load f16.safetensor instead of just .safetensor - std::string unet_path = path_join(file_path, "unet/diffusion_pytorch_model.safetensors");)
$ ./bin/sd -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors --vae ~/downloaded_models/sdxl-turbo/ -p "a lovely cat"
[INFO] stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO] model.cpp:638 - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO] stable-diffusion.cpp:5395 - loading vae from '~/downloaded_models/sdxl-turbo/'
[INFO] model.cpp:632 - load ~/downloaded_models/sdxl-turbo/ using diffusers format
[INFO] stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO] stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.bias' in model file
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.weight' in model file
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.bias' in model file
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.weight' in model file
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.bias' in model file
[WARN] stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.weight' in model file
[INFO] stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO] stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 2.61s
[INFO] stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO] stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1592 ms
[INFO] stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO] stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
|==================================================| 20/20 - 18.09s/it
[INFO] stable-diffusion.cpp:6551 - sampling completed, taking 353.85s
[INFO] stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.85s
[INFO] stable-diffusion.cpp:6561 - decoding 1 latents
[INFO] stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.08s
[INFO] stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.08s
[INFO] stable-diffusion.cpp:6590 - txt2img completed in 372.51s
[INFO] main.cpp:538 - save result image to 'output.png'
But its the same result.. i have tried the older stable diffusion - stable-diffusion-2-1/v2-1_768-nonema-pruned.safetensors
it works..
I m running on ubuntu 22.03.
@ranjithum The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: SDXL VAE FP16 Fix.
./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
@leejet - Perfect thanks.. It worked..
@leejet we should probably put up a warning in the program, when f32 vae is used. (until its fixed).
Works for me, but colors are weirdly off with SD XL plus fp16 fix:
Works for me, but colors are weirdly off with SD XL plus fp16 fix:
Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.
Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.
Nope, still just as broken for me.
stable-diffusion.cpp/build/bin/sd -m stable-diffusion.cpp/models/sd_xl_turbo_1.0_fp16.safetensors --vae stable-diffusion.cpp/models/sdxl_vae.safetensors --steps 1 --cfg-scale 1 -s -1 -p "a lovely cat"
Work perfectly for me
Loras don't work for me for some reason. maybe I'm doing something incorrectly.
I'm using the following command:
for m in models/SDXL/*.safetensors; do ./stable-diffusion.cpp/dist/bin/sd -m "${m}" -p "a cute cat <lora:SCRATCHBOARD ILLUSTRATION:0.8>" -W 1024 -H 1024 --steps 30 --sampling-method dpm++2m --schedule karras --embd-dir models/embeddings/ --vae models/SDXL/vae/sdxl_vae.safetensors -s $RANDOM -b 2 --lora-model-dir models/SDXL/lora/ -v -o images/$(basename -- "$m" ".${m##*.}"| tr " " "-").png -v ; done;
and this lora https://civitai.com/models/279729/wizards-scratchboard-illustration
The relevant (abbreviated) portion of the output:
[INFO ] model.cpp:645 - load models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors using safetensors format
[DEBUG] model.cpp:711 - init from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] ggml_extend.hpp:555 - lora params backend buffer size = 874.24 MB (10240 tensors)
[INFO ] lora.hpp:35 - loading LoRA from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] model.cpp:1262 - loading tensors from models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors
[DEBUG] lora.hpp:58 - finished loaded lora
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.alpha
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.alpha
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight
[WARN ] lora.hpp:154 - unused lora tensor lora.te1_text_model_encoder_layers_0_self_attn_k_proj.alpha
...
UPD: it now works after at least the 48bcce493f45a11d9d5a4c69943d03ff919d748f commit
The official example LoRA is failing for me too (from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main).
(base) tnunamak@pc:~/stable-diffusion.cpp$ ./build/bin/sd -m models/sd_xl_base_1.0.safetensors --vae models/sdxl_vae.safetensors -H 1024 -W 768 --cfg-scale 1 --steps 35 -p "A lovely cat <lora:sd_xl_offset_example-lora_1.0:0.8>" --lora-model-dir models
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165 - loading model from 'models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705 - load models/sd_xl_base_1.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:176 - loading vae from 'models/sdxl_vae.safetensors'
[INFO ] model.cpp:705 - load models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:188 - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194 - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:400 - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:419 - loading model from 'models/sd_xl_base_1.0.safetensors' completed, taking 3.72s
[INFO ] stable-diffusion.cpp:436 - running in eps-prediction mode
[INFO ] model.cpp:705 - load models/sd_xl_offset_example-lora_1.0.safetensors using safetensors format
[INFO ] lora.hpp:38 - loading LoRA from 'models/sd_xl_offset_example-lora_1.0.safetensors'
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.alpha
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_down.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_up.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.alpha
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_down.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_up.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.alpha
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_down.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_up.weight
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_input_blocks_2_0_emb_layers_1.alpha
...
[WARN ] lora.hpp:160 - unused lora tensor lora.unet_output_blocks_8_0_skip_connection.lora_up.weight
[INFO ] stable-diffusion.cpp:524 - lora 'sd_xl_offset_example-lora_1.0' applied, taking 1.01s
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 1.01s
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 93 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 42
|==================================================| 35/35 - 2.90it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 0.99s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 0.99s
[INFO ] stable-diffusion.cpp:1810 - txt2img completed in 13.70s
save result image to 'output.png'
double free or corruption (fasttop)
Aborted (core dumped)
I'm willing to implement SDXL once I've improved the support for SD 1.x and added support for SD 2.x.