leejet / stable-diffusion.cpp

Stable Diffusion in pure C/C++
MIT License
2.91k stars 233 forks source link

taesd model does not work after reloading the sd_ctx #223

Open Jonathhhan opened 3 months ago

Jonathhhan commented 3 months ago

If I load a second model (reload the sd_ctx) teasd does not work anymore (it results in a black output image).

System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
New BaseEngine 0000011E67DD67B0
stable-diffusion.cpp:151  - Using CUDA backend
New GLFWEngine 0000011E67DD67B0
[notice ] EngineGLFW::setup(): Replaced the openFrameworks' GLFW event listeners by the imgui_impl_glfw ones. You will not have multi-window nor multi-context support. This can be enabled by defining OFXIMGUI_GLFW_FIX_MULTICONTEXT_PRIMARY_VP=1.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
stable-diffusion.cpp:171  - loading model from 'data/models/v1-5-pruned-emaonly.safetensors'
model.cpp:735  - load data/models/v1-5-pruned-emaonly.safetensors using safetensors format
model.cpp:801  - init from 'data/models/v1-5-pruned-emaonly.safetensors'
stable-diffusion.cpp:182  - loading vae from 'data/models/vae/vae.safetensors'
model.cpp:735  - load data/models/vae/vae.safetensors using safetensors format
model.cpp:801  - init from 'data/models/vae/vae.safetensors'
stable-diffusion.cpp:194  - Stable Diffusion 1.x
stable-diffusion.cpp:200  - Stable Diffusion weight type: f32
stable-diffusion.cpp:201  - ggml tensor size = 432 bytes
ggml_extend.hpp:890  - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
ggml_extend.hpp:890  - unet params backend buffer size =  2155.33 MB(VRAM) (686 tensors)
ggml_extend.hpp:890  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
stable-diffusion.cpp:302  - loading vocab
clip.hpp:164  - vocab size: 49408
clip.hpp:175  -  trigger word img already in vocab
stable-diffusion.cpp:322  - loading weights
model.cpp:1373 - loading tensors from data/models/v1-5-pruned-emaonly.safetensors
model.cpp:1373 - loading tensors from data/models/vae/vae.safetensors
stable-diffusion.cpp:421  - total params memory size = 2784.45MB (VRAM 2784.45MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:425  - loading model from 'data/models/v1-5-pruned-emaonly.safetensors' completed, taking 5.35s
stable-diffusion.cpp:442  - running in eps-prediction mode
stable-diffusion.cpp:470  - finished loaded file
upscaler.cpp:19   - Using CUDA backend
upscaler.cpp:32   - Upscaler weight type: f16
esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
ggml_extend.hpp:890  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
model.cpp:738  - load data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth using checkpoint format
model.cpp:1251 - init from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
model.cpp:1373 - loading tensors from data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
esrgan.hpp:183  - esrgan model loaded
stable-diffusion.cpp:1557 - txt2img 768x768
stable-diffusion.cpp:1578 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_0.jpg'
stable-diffusion.cpp:1578 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_1.jpg'
stable-diffusion.cpp:1578 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_2.png'
stable-diffusion.cpp:1578 - PhotoMaker loaded image from 'C:\Users\Jonat\Desktop\of_v20240306_vs_release\addons\ofxStableDiffusion\ofxStableDiffusionExample\bin\data/photomaker_images/newton_man\newton_3.jpg'
stable-diffusion.cpp:1603 - prompt after extract and remove lora: "animal with futuristic clothes"
stable-diffusion.cpp:553  - Attempting to apply 0 LoRAs
stable-diffusion.cpp:1608 - apply_loras completed, taking 0.00s
clip.hpp:1328 - parse 'animal with futuristic clothes' to [['animal with futuristic clothes', 1], ]
clip.hpp:1168 - token length: 77
ggml_extend.hpp:841  - clip compute buffer size: 1.40 MB(VRAM)
stable-diffusion.cpp:679  - computing condition graph completed, taking 124 ms
stable-diffusion.cpp:1719 - get_learned_condition completed, taking 126 ms
stable-diffusion.cpp:1735 - sampling using Euler A method
stable-diffusion.cpp:1739 - generating image: 1/4 - seed 31798
ggml_extend.hpp:841  - unet compute buffer size: 2690.33 MB(VRAM)
  |==================================================| 5/5 - 4.43it/s
stable-diffusion.cpp:1776 - sampling completed, taking 1.47s
stable-diffusion.cpp:1739 - generating image: 2/4 - seed 31799
ggml_extend.hpp:841  - unet compute buffer size: 2690.33 MB(VRAM)
  |==================================================| 5/5 - 4.45it/s
stable-diffusion.cpp:1776 - sampling completed, taking 1.30s
stable-diffusion.cpp:1739 - generating image: 3/4 - seed 31800
ggml_extend.hpp:841  - unet compute buffer size: 2690.33 MB(VRAM)
  |==================================================| 5/5 - 4.44it/s
stable-diffusion.cpp:1776 - sampling completed, taking 1.31s
stable-diffusion.cpp:1739 - generating image: 4/4 - seed 31801
ggml_extend.hpp:841  - unet compute buffer size: 2690.33 MB(VRAM)
  |==================================================| 5/5 - 4.43it/s
stable-diffusion.cpp:1776 - sampling completed, taking 1.31s
stable-diffusion.cpp:1784 - generating 4 latent images completed, taking 5.40s
stable-diffusion.cpp:1786 - decoding 4 latents
ggml_extend.hpp:841  - vae compute buffer size: 3744.00 MB(VRAM)
stable-diffusion.cpp:1453 - computing vae [mode: DECODE] graph completed, taking 0.67s
stable-diffusion.cpp:1796 - latent 1 decoded, taking 0.67s
ggml_extend.hpp:841  - vae compute buffer size: 3744.00 MB(VRAM)
stable-diffusion.cpp:1453 - computing vae [mode: DECODE] graph completed, taking 0.67s
stable-diffusion.cpp:1796 - latent 2 decoded, taking 0.67s
ggml_extend.hpp:841  - vae compute buffer size: 3744.00 MB(VRAM)
stable-diffusion.cpp:1453 - computing vae [mode: DECODE] graph completed, taking 0.67s
stable-diffusion.cpp:1796 - latent 3 decoded, taking 0.67s
ggml_extend.hpp:841  - vae compute buffer size: 3744.00 MB(VRAM)
stable-diffusion.cpp:1453 - computing vae [mode: DECODE] graph completed, taking 0.67s
stable-diffusion.cpp:1796 - latent 4 decoded, taking 0.67s
stable-diffusion.cpp:1800 - decode_first_stage completed, taking 2.66s
stable-diffusion.cpp:1819 - txt2img completed in 8.19s
stable-diffusion.cpp:151  - Using CUDA backend
stable-diffusion.cpp:171  - loading model from 'F:\Stable Diffusion Models\sdxl\sd_xl_turbo_1.0_fp16.safetensors'
model.cpp:735  - load F:\Stable Diffusion Models\sdxl\sd_xl_turbo_1.0_fp16.safetensors using safetensors format
model.cpp:801  - init from 'F:\Stable Diffusion Models\sdxl\sd_xl_turbo_1.0_fp16.safetensors'
stable-diffusion.cpp:182  - loading vae from 'data/models/vae/vae.safetensors'
model.cpp:741  - unknown format data/models/vae/vae.safetensors
stable-diffusion.cpp:184  - loading vae from 'data/models/vae/vae.safetensors' failed
stable-diffusion.cpp:194  - Stable Diffusion XL
stable-diffusion.cpp:200  - Stable Diffusion weight type: f16
stable-diffusion.cpp:201  - ggml tensor size = 432 bytes
ggml_extend.hpp:890  - clip params backend buffer size =  1564.36 MB(VRAM) (713 tensors)
ggml_extend.hpp:890  - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
ggml_extend.hpp:890  - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
stable-diffusion.cpp:302  - loading vocab
clip.hpp:164  - vocab size: 49408
clip.hpp:175  -  trigger word img already in vocab
stable-diffusion.cpp:322  - loading weights
model.cpp:1373 - loading tensors from F:\Stable Diffusion Models\sdxl\sd_xl_turbo_1.0_fp16.safetensors
stable-diffusion.cpp:421  - total params memory size = 6624.11MB (VRAM 6624.11MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
stable-diffusion.cpp:425  - loading model from 'F:\Stable Diffusion Models\sdxl\sd_xl_turbo_1.0_fp16.safetensors' completed, taking 5.92s
stable-diffusion.cpp:442  - running in eps-prediction mode
stable-diffusion.cpp:470  - finished loaded file
upscaler.cpp:19   - Using CUDA backend
upscaler.cpp:32   - Upscaler weight type: f16
esrgan.hpp:164  - loading esrgan from 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
ggml_extend.hpp:890  - esrgan params backend buffer size =   8.53 MB(VRAM) (192 tensors)
model.cpp:741  - unknown format data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth
esrgan.hpp:172  - init esrgan model loader from file failed: 'data/models/esrgan/RealESRGAN_x4plus_anime_6B.pth'
stable-diffusion.cpp:1557 - txt2img 768x768
Unable to find directory.
stable-diffusion.cpp:1603 - prompt after extract and remove lora: "animal with futuristic clothes"
stable-diffusion.cpp:553  - Attempting to apply 0 LoRAs
stable-diffusion.cpp:1608 - apply_loras completed, taking 0.00s
clip.hpp:1328 - parse 'animal with futuristic clothes' to [['animal with futuristic clothes', 1], ]
clip.hpp:1168 - token length: 77
ggml_extend.hpp:841  - clip compute buffer size: 2.56 MB(VRAM)
ggml_extend.hpp:841  - clip compute buffer size: 8.58 MB(VRAM)
stable-diffusion.cpp:679  - computing condition graph completed, taking 219 ms
stable-diffusion.cpp:1719 - get_learned_condition completed, taking 221 ms
stable-diffusion.cpp:1735 - sampling using Euler A method
stable-diffusion.cpp:1739 - generating image: 1/1 - seed 31893
ggml_extend.hpp:841  - unet compute buffer size: 331.51 MB(VRAM)
  |==================================================| 5/5 - 4.47it/s
stable-diffusion.cpp:1776 - sampling completed, taking 1.24s
stable-diffusion.cpp:1784 - generating 1 latent images completed, taking 1.24s
stable-diffusion.cpp:1786 - decoding 1 latents
ggml_extend.hpp:841  - vae compute buffer size: 3744.00 MB(VRAM)
stable-diffusion.cpp:1453 - computing vae [mode: DECODE] graph completed, taking 0.66s
stable-diffusion.cpp:1796 - latent 1 decoded, taking 0.66s
stable-diffusion.cpp:1800 - decode_first_stage completed, taking 0.66s
stable-diffusion.cpp:1819 - txt2img completed in 2.12s

This happens while loading the first sd_ctx:

stable-diffusion.cpp:182  - loading vae from 'data/models/vae/vae.safetensors'
model.cpp:735  - load data/models/vae/vae.safetensors using safetensors format
model.cpp:801  - init from 'data/models/vae/vae.safetensors'

And this while loading the second:

stable-diffusion.cpp:182  - loading vae from 'data/models/vae/vae.safetensors'
model.cpp:741  - unknown format data/models/vae/vae.safetensors
FSSRepo commented 2 months ago

@Jonathhhan I don't understand this case you're raising. Are you referring to reloading the model? Or deleting an already created context and creating a new one? Or that when you close the program and then reopen it, taesd doesn't work? I'm not grasping it.

Anyway, it seems to be a bug when reading the file; sometimes, for no apparent reason, tellg returns -1, so I don't trust checking the file if it's a safetensor this way.

Jonathhhan commented 2 months ago

@FSSRepo I mean reloading the model, which also means deleting a created context and creating a new one. Actually you can also see it in the result (a black image) - same as without an taesd model + sdxl. The main stuff happens here: https://github.com/Jonathhhan/ofxStableDiffusion/blob/main/ofxStableDiffusionExample/src/stableDiffusionThread.cpp