leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.29k stars 276 forks source link

Unable to apply Flux LoRA #418

Open geocine opened 20 hours ago

geocine commented 20 hours ago

LoRA is loaded but is not applied. Full logs is attached as file below.

related issue #370

lora_down|lora_up flux_lora.log

Option:
    n_threads:         8
    mode:              txt2img
    model_path:
    wtype:             unspecified
    clip_l_path:       E:\models\clip\clip_l.safetensors
    t5xxl_path:        E:\models\clip\t5xxl_fp16.safetensors
    diffusion_model_path:   E:\models\unet\flux1-dev-Q8_0.gguf
    vae_path:          E:\models\vae\ae.safetensors
    taesd_path:
    esrgan_path:
    controlnet_path:
    embeddings_path:
    stacked_id_embeddings_path:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       .\outputs\20242509_05905.png
    init_img:
    control_image:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            a photo of a man<lora:1521964-person:1> holding a sign says 'flux.cpp'
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         1.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:157  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:202  - loading clip_l from 'E:\models\clip\clip_l.safetensors'
[INFO ] model.cpp:793  - load E:\models\clip\clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\clip\clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209  - loading t5xxl from 'E:\models\clip\t5xxl_fp16.safetensors'
[INFO ] model.cpp:793  - load E:\models\clip\t5xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\clip\t5xxl_fp16.safetensors'
[INFO ] stable-diffusion.cpp:216  - loading diffusion model from 'E:\models\unet\flux1-dev-Q8_0.gguf'
[INFO ] model.cpp:790  - load E:\models\unet\flux1-dev-Q8_0.gguf using gguf format
[DEBUG] model.cpp:807  - init from 'E:\models\unet\flux1-dev-Q8_0.gguf'
[INFO ] stable-diffusion.cpp:223  - loading vae from 'E:\models\vae\ae.safetensors'
[INFO ] model.cpp:793  - load E:\models\vae\ae.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\vae\ae.safetensors'
[INFO ] stable-diffusion.cpp:235  - Version: Flux Dev
[INFO ] stable-diffusion.cpp:266  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:267  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:268  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:269  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:271  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:310  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:313  - CLIP: Using CPU backend
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size =  12068.09 MB(VRAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398  - loading weights
[DEBUG] model.cpp:1530 - loading tensors from E:\models\clip\clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from E:\models\clip\t5xxl_fp16.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from E:\models\unet\flux1-dev-Q8_0.gguf
[DEBUG] model.cpp:1530 - loading tensors from E:\models\vae\ae.safetensors
[INFO ] stable-diffusion.cpp:497  - total params memory size = 21481.50MB (VRAM 12162.66MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12068.09MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501  - loading model from '' completed, taking 59.37s
[INFO ] stable-diffusion.cpp:518  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:572  - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1123 - lora 1521964-person:1.00
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "a photo of a man holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:655  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:793  - load E:\models\loras/1521964-person.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\loras/1521964-person.safetensors'
[INFO ] lora.hpp:33   - loading LoRA from 'E:\models\loras/1521964-person.safetensors'
[DEBUG] model.cpp:1530 - loading tensors from E:\models\loras/1521964-person.safetensors
[DEBUG] ggml_extend.hpp:1050 - lora params backend buffer size =  285.00 MB(VRAM) (380 tensors)
[DEBUG] model.cpp:1530 - loading tensors from E:\models\loras/1521964-person.safetensors
[DEBUG] lora.hpp:69   - finished loaded lora
[WARN ] lora.hpp:176  - unused lora tensor lora.transformer_single_transformer_blocks_0_attn_to_k.lora_down.weight
.........
[WARN ] lora.hpp:186  - Only (0 / 380) LoRA tensors have been applied
[INFO ] stable-diffusion.cpp:632  - lora '1521964-person' applied, taking 0.53s
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.53s
[DEBUG] conditioner.hpp:1036 - parse 'a photo of a man holding a sign says 'flux.cpp'' to [['a photo of a man holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] ggml_extend.hpp:1001 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 8655 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 8659 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1001 - flux compute buffer size: 398.50 MB(VRAM)
  |==================================================| 20/20 - 1.65it/s
[INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 12.84s
[INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 13.51s
[INFO ] stable-diffusion.cpp:1326 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1001 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:987  - computing vae [mode: DECODE] graph completed, taking 0.33s
[INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 0.34s
[INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 0.34s
[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 23.03s
save result image to '.\outputs\20242509_05905.png'

lora_A|lora_B flux_lora.log

Option:
    n_threads:         8
    mode:              txt2img
    model_path:
    wtype:             unspecified
    clip_l_path:       E:\models\clip\clip_l.safetensors
    t5xxl_path:        E:\models\clip\t5xxl_fp16.safetensors
    diffusion_model_path:   E:\models\unet\flux1-dev-Q8_0.gguf
    vae_path:          E:\models\vae\ae.safetensors
    taesd_path:
    esrgan_path:
    controlnet_path:
    embeddings_path:
    stacked_id_embeddings_path:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       .\outputs\20242509_11252.png
    init_img:
    control_image:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            a photo of a man<lora:flux-person-trained:1> holding a sign says 'flux.cpp'
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         1.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              11547
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:157  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:202  - loading clip_l from 'E:\models\clip\clip_l.safetensors'
[INFO ] model.cpp:793  - load E:\models\clip\clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\clip\clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209  - loading t5xxl from 'E:\models\clip\t5xxl_fp16.safetensors'
[INFO ] model.cpp:793  - load E:\models\clip\t5xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\clip\t5xxl_fp16.safetensors'
[INFO ] stable-diffusion.cpp:216  - loading diffusion model from 'E:\models\unet\flux1-dev-Q8_0.gguf'
[INFO ] model.cpp:790  - load E:\models\unet\flux1-dev-Q8_0.gguf using gguf format
[DEBUG] model.cpp:807  - init from 'E:\models\unet\flux1-dev-Q8_0.gguf'
[INFO ] stable-diffusion.cpp:223  - loading vae from 'E:\models\vae\ae.safetensors'
[INFO ] model.cpp:793  - load E:\models\vae\ae.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\vae\ae.safetensors'
[INFO ] stable-diffusion.cpp:235  - Version: Flux Dev
[INFO ] stable-diffusion.cpp:266  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:267  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:268  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:269  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:271  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:310  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:313  - CLIP: Using CPU backend
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size =  12068.09 MB(VRAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398  - loading weights
[DEBUG] model.cpp:1530 - loading tensors from E:\models\clip\clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from E:\models\clip\t5xxl_fp16.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from E:\models\unet\flux1-dev-Q8_0.gguf
[DEBUG] model.cpp:1530 - loading tensors from E:\models\vae\ae.safetensors
[INFO ] stable-diffusion.cpp:497  - total params memory size = 21481.50MB (VRAM 12162.66MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12068.09MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501  - loading model from '' completed, taking 58.51s
[INFO ] stable-diffusion.cpp:518  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:572  - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1123 - lora flux-person-trained:1.00
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "a photo of a man holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:655  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:793  - load E:\models\loras/flux-person-trained.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from 'E:\models\loras/flux-person-trained.safetensors'
[INFO ] lora.hpp:33   - loading LoRA from 'E:\models\loras/flux-person-trained.safetensors'
[DEBUG] model.cpp:1530 - loading tensors from E:\models\loras/flux-person-trained.safetensors
[DEBUG] ggml_extend.hpp:1050 - lora params backend buffer size =  163.88 MB(VRAM) (988 tensors)
[DEBUG] model.cpp:1530 - loading tensors from E:\models\loras/flux-person-trained.safetensors
[DEBUG] lora.hpp:69   - finished loaded lora
[WARN ] lora.hpp:176  - unused lora tensor transformer.single_transformer_blocks.0.attn.to_k.lora_A.weight
[WARN ] lora.hpp:176  - unused lora tensor transformer.single_transformer_blocks.0.attn.to_k.lora_B.weight
---
[WARN ] lora.hpp:186  - Only (0 / 988) LoRA tensors have been applied
[INFO ] stable-diffusion.cpp:632  - lora 'flux-person-trained' applied, taking 0.48s
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.48s
[DEBUG] conditioner.hpp:1036 - parse 'a photo of a man holding a sign says 'flux.cpp'' to [['a photo of a man holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] ggml_extend.hpp:1001 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 8553 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 8556 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 11547
[DEBUG] ggml_extend.hpp:1001 - flux compute buffer size: 398.50 MB(VRAM)
  |==================================================| 20/20 - 1.67it/s
[INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 12.60s
[INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 13.24s
[INFO ] stable-diffusion.cpp:1326 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1001 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:987  - computing vae [mode: DECODE] graph completed, taking 0.33s
[INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 0.33s
[INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 0.33s
[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 22.61s
save result image to '.\outputs\20242509_11252.png'
geocine commented 19 hours ago

There is a conversion script here for references