photoMaker issue with two or more generated images / SDXL sample steps

Jonathhhan commented 3 months ago

Very nice feature, thanks. It seems that only the first generated image works after loading the sd_ctx (multiple images work with batch size > 1).

I used the Newton images from the example and the prompt: "man img, man with futuristic clothes".

This is the first image: ofxStableDiffusion-2024-03-19-03-33-18

And this is the second image: ofxStableDiffusion-2024-03-19-03-34-24

And with SDXL-Turbo photoMaker seems to need less than the fixed sample 50 steps...

Green-Sky commented 3 months ago

It looks like the cfg scale was too high for the first image.

Jonathhhan commented 3 months ago

@Green-Sky yes, I used 7 and not the recommended 5.

bssrdf commented 3 months ago

@Jonathhhan, could you provide the full command line with SDXL and Photomaker model files? In particular, did you use the file from https://huggingface.co/bssrdf/PhotoMaker?

Here are what I can generate using Newton example images and your prompt with batch size 2.

bin/sd -m ../models/RealVisXL_V3.0.safetensors  --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir examples/newton_man -p "man img, man with futuristic clothes"  --cfg-scale 7 --sampling-method euler -H 1024 -W 1024  -b 2 -o newton_issu01.png
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from '../models/RealVisXL_V3.0.safetensors'
[INFO ] model.cpp:705  - load ../models/RealVisXL_V3.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[WARN ] stable-diffusion.cpp:200  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from '../models/photomaker-v1.safetensors'
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from '../models/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:400  - total params memory size = 7182.38MB (VRAM 7182.38MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from '../models/RealVisXL_V3.0.safetensors' completed, taking 88.15s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'examples/newton_man/newton_3.jpg'
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.09s
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 548 ms
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 20 to 50 for PHOTOMAKER
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 157 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/2 - seed 42
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
  |==================================================| 50/50 - 1.84it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 27.58s
[INFO ] stable-diffusion.cpp:1732 - generating image: 2/2 - seed 43
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
  |==================================================| 50/50 - 1.79it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 27.52s
[INFO ] stable-diffusion.cpp:1777 - generating 2 latent images completed, taking 55.12s
[INFO ] stable-diffusion.cpp:1779 - decoding 2 latents
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.15s
[INFO ] stable-diffusion.cpp:1789 - latent 2 decoded, taking 1.17s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 2.31s
[INFO ] stable-diffusion.cpp:1810 - txt2img completed in 57.60s
save result image to 'newton_issu01.png'
save result image to 'newton_issu01_2.png'
double free or corruption (fasttop)
Aborted

newton_issu01 newton_issu01_2

They look fine.

Jonathhhan commented 3 months ago

@bssrdf batch processing works fine. The issue appears, if I run txt2img for a second time without reloading the sd_ctx. The console output looks exactly the same for both runs:

System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
New BaseEngine 00000202288E6220
New GLFWEngine 00000202288E6220
[DEBUG] stable-diffusion.cpp:145  - Using CUDA backend
[notice ] EngineGLFW::setup(): Replaced the openFrameworks' GLFW event listeners by the imgui_impl_glfw ones. You will not have multi-window nor multi-context support. This can be enabled by defining OFXIMGUI_GLFW_FIX_MULTICONTEXT_PRIMARY_VP=1.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load data/models/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:176  - loading vae from 'data/models/vae/vae.safetensors'
[INFO ] model.cpp:705  - load data/models/vae/vae.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/vae/vae.safetensors'
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:195  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:884  - clip params backend buffer size =  1564.36 MB(VRAM) (713 tensors)
[DEBUG] ggml_extend.hpp:884  - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:884  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:884  - lora params backend buffer size =  354.38 MB(VRAM) (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:884  - pmid params backend buffer size =  623.48 MB(VRAM) (407 tensors)
[DEBUG] stable-diffusion.cpp:296  - loading vocab
[DEBUG] clip.hpp:164  - vocab size: 49408
[DEBUG] clip.hpp:175  -  trigger word img already in vocab
[DEBUG] stable-diffusion.cpp:316  - loading weights
[DEBUG] model.cpp:1343 - loading tensors from data/models/sd_xl_base_1.0.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/vae/vae.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[INFO ] stable-diffusion.cpp:415  - total params memory size = 7247.59MB (VRAM 7247.59MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'data/models/sd_xl_base_1.0.safetensors' completed, taking 4.77s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:464  - finished loaded file
[DEBUG] upscaler.cpp:19   - Using CUDA backend
[INFO ] upscaler.cpp:32   - Upscaler weight type: f16
[INFO ] esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] ggml_extend.hpp:884  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
[INFO ] model.cpp:708  - load data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth using checkpoint format
[DEBUG] model.cpp:1221 - init from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] model.cpp:1343 - loading tensors from data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
[INFO ] esrgan.hpp:183  - esrgan model loaded

[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.28s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 86 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 161 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 61 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 54 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 117 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2058
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.56s

[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.26s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 127 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 55 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 111 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2215
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.02s

bssrdf commented 3 months ago

@bssrdf batch processing works fine. The issue appears, if I run txt2img for a second time without reloading the sd_ctx. The console output looks exactly the same for both runs:

System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
New BaseEngine 00000202288E6220
New GLFWEngine 00000202288E6220
[DEBUG] stable-diffusion.cpp:145  - Using CUDA backend
[notice ] EngineGLFW::setup(): Replaced the openFrameworks' GLFW event listeners by the imgui_impl_glfw ones. You will not have multi-window nor multi-context support. This can be enabled by defining OFXIMGUI_GLFW_FIX_MULTICONTEXT_PRIMARY_VP=1.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load data/models/sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:176  - loading vae from 'data/models/vae/vae.safetensors'
[INFO ] model.cpp:705  - load data/models/vae/vae.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/vae/vae.safetensors'
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:195  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:884  - clip params backend buffer size =  1564.36 MB(VRAM) (713 tensors)
[DEBUG] ggml_extend.hpp:884  - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:884  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:884  - lora params backend buffer size =  354.38 MB(VRAM) (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:275  - loading stacked ID embedding (PHOTOMAKER) model file from 'data/models/photomaker/photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load data/models/photomaker/photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from 'data/models/photomaker/photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:884  - pmid params backend buffer size =  623.48 MB(VRAM) (407 tensors)
[DEBUG] stable-diffusion.cpp:296  - loading vocab
[DEBUG] clip.hpp:164  - vocab size: 49408
[DEBUG] clip.hpp:175  -  trigger word img already in vocab
[DEBUG] stable-diffusion.cpp:316  - loading weights
[DEBUG] model.cpp:1343 - loading tensors from data/models/sd_xl_base_1.0.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/vae/vae.safetensors
[DEBUG] model.cpp:1343 - loading tensors from data/models/photomaker/photomaker-v1.safetensors
[INFO ] stable-diffusion.cpp:415  - total params memory size = 7247.59MB (VRAM 7247.59MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'data/models/sd_xl_base_1.0.safetensors' completed, taking 4.77s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:464  - finished loaded file
[DEBUG] upscaler.cpp:19   - Using CUDA backend
[INFO ] upscaler.cpp:32   - Upscaler weight type: f16
[INFO ] esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] ggml_extend.hpp:884  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
[INFO ] model.cpp:708  - load data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth using checkpoint format
[DEBUG] model.cpp:1221 - init from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
[DEBUG] model.cpp:1343 - loading tensors from data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
[INFO ] esrgan.hpp:183  - esrgan model loaded
[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.28s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 86 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 161 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 61 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 54 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 117 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2058
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 41.23s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.56s
[DEBUG] stable-diffusion.cpp:1551 - txt2img 1024x1024
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
[INFO ] stable-diffusion.cpp:1572 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
[DEBUG] stable-diffusion.cpp:1597 - prompt after extract and remove lora: "man img, man with futuristic clothes"
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 0.00s
[DEBUG] ggml_extend.hpp:835  - lora compute buffer size: 20.50 MB(VRAM)
[INFO ] stable-diffusion.cpp:1608 - pmid_lora apply completed, taking 0.26s
[DEBUG] clip.hpp:1222 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[DEBUG] ggml_extend.hpp:835  - pmid compute buffer size: 40.31 MB(VRAM)
[INFO ] stable-diffusion.cpp:1672 - Photomaker ID Stacking, taking 127 ms
[DEBUG] clip.hpp:1328 - parse 'man img, man with futuristic clothes' to [['man img, man with futuristic clothes', 1], ]
[INFO ] stable-diffusion.cpp:1681 - sampling steps increases from 15 to 50 for PHOTOMAKER
[DEBUG] clip.hpp:1328 - parse 'man , man with futuristic clothes' to [['man , man with futuristic clothes', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 55 ms
[DEBUG] clip.hpp:1328 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:1168 - token length: 77
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 2.56 MB(VRAM)
[DEBUG] ggml_extend.hpp:835  - clip compute buffer size: 8.58 MB(VRAM)
[DEBUG] stable-diffusion.cpp:673  - computing condition graph completed, taking 53 ms
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 111 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 2215
[INFO ] stable-diffusion.cpp:1745 - PHOTOMAKER: start_merge_step: 10
[DEBUG] ggml_extend.hpp:835  - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 50/50 - 1.28it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 40.68s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[DEBUG] ggml_extend.hpp:835  - vae compute buffer size: 6656.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1447 - computing vae [mode: DECODE] graph completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 1.22s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 1.22s
[INFO ] stable-diffusion.cpp:1812 - txt2img completed in 42.02s

Sorry, I mis-read your first message :blush: Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

Jonathhhan commented 3 months ago

Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

@bssrdf good point. Yes, it works without photomaker (if the path to the photomaker model is empty). It crashes, if the model is loaded and I leave "man (something) img, " away (which is a non related issue, but could be a nice way to trigger photomaker).

bssrdf commented 3 months ago

Can you try running more than one txt2img call but without photomaker? Just to isolate whether this is a photomaker specific issue.

@bssrdf good point. Yes, it works without photomaker (if the path to the photomaker model is empty). It crashes, if the model is loaded and I leave "man (something) img, " away (which is a non related issue, but could be a nice way to trigger photomaker).

@Jonathhhan, can you provide details about how to run 2 txt2img without reloading sd_ctx? Did you change the code in main.cpp?

Jonathhhan commented 3 months ago

@bssrdf of course. I made an addon for Open Frameworks and do not use main.cpp at all (which complicates it a little): https://github.com/Jonathhhan/ofxStableDiffusion In this file happens most of the relevant stuff: https://github.com/Jonathhhan/ofxStableDiffusion/blob/main/ofxStableDiffusionExample/src/stableDiffusionThread.cpp

fszontagh commented 3 months ago

@Jonathhhan did you set the "isFreeParamsImmediatly" to false?

Jonathhhan commented 3 months ago

did you set the "isFreeParamsImmediatly" to false?

@fszontagh Yes.

bssrdf commented 3 months ago

@bssrdf of course. I made an addon for Open Frameworks and do not use main.cpp at all (which complicates it a little): https://github.com/Jonathhhan/ofxStableDiffusion In this file happens most of the relevant stuff: https://github.com/Jonathhhan/ofxStableDiffusion/blob/main/ofxStableDiffusionExample/src/stableDiffusionThread.cpp

@Jonathhhan, I have reproduced the issue and implemented a fix. Please wait for the merged PR or you can try the branch. Thanks for reporting the bug.

Jonathhhan commented 3 months ago

@bssrdf thanks (I can confirm that it works now).

leejet / stable-diffusion.cpp

photoMaker issue with two or more generated images / SDXL sample steps #207