leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.56k stars 307 forks source link

gibberish/noisy image with converted to Q8_0 gguf model #427

Closed fractal-fumbler closed 1 month ago

fractal-fumbler commented 2 months ago

Hello :) I am using compiled sd.cpp with SYCL kernel since i have intel arc gpu

so i started with conversion model from https://civitai.com/models/141592/pixelwave (which is 22+ gb) and it finished without any errors

  sd -M convert -m /home/models_test/pixelwave_flux1Dev02.safetensors -o  /home/unet/pixelwave_flux1De
v02_Q8_0.gguf -v --type q8_0           
log of conversion process ```python Option: n_threads: 8 mode: convert model_path: /home/models_test/pixelwave_flux1Dev02.safetensors wtype: q8_0 clip_l_path: t5xxl_path: diffusion_model_path: vae_path: taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: /home/unet/pixelwave_flux1Dev02_Q8_0.gguf init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:false strength(control): 0.90 prompt: negative_prompt: min_cfg: 1.00 cfg_scale: 7.00 guidance: 3.50 clip_skip: -1 width: 512 height: 512 sample_method: euler_a schedule: default sample_steps: 20 strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 System Info: BLAS = 1 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [INFO ] model.cpp:793 - load /home/models_test/pixelwave_flux1Dev02.safetensors using safetensors format [DEBUG] model.cpp:861 - init from ' /home/models_test/pixelwave_flux1Dev02.safetensors' [INFO ] model.cpp:1776 - model tensors mem size: 12248.99MB [DEBUG] model.cpp:1530 - loading tensors from /home/models_test/pixelwave_flux1Dev02.safetensors [INFO ] model.cpp:1811 - load tensors done [INFO ] model.cpp:1812 - trying to save tensors to /home/unet/pixelwave_flux1Dev02_Q8_0.gguf convert ' /home/models_test/pixelwave_flux1Dev02.safetensors'/'' to ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf' success ```

then i am trying to generate image with sd.cpp using this command and no errors again so far while generating image

  sd --diffusion-model /home/unet/pixelwave_flux1Dev02_Q8_0.gguf --vae /home/vae/flux_vae.safetensors --clip_l /home/clip/clip_l.safetensors --t5xxl /home/clip/clip_t5xxl_fp16.safetensors  -p "The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within" --cfg-scale 1.0 --sampling-method euler --schedule discrete -v -H 1024 -W 1024 --steps 16 --vae-on-cpu -o /tmp/output.png 
log of image generation process ```python Option: n_threads: 8 mode: txt2img model_path: wtype: unspecified clip_l_path: /home/clip/clip_l.safetensors t5xxl_path: /home/clip/clip_t5xxl_fp16.safetensors diffusion_model_path: /home/unet/pixelwave_flux1Dev02_Q8_0.gguf vae_path: /home/vae/flux_vae.safetensors taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: /tmp/output.png init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:true strength(control): 0.90 prompt: The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within negative_prompt: min_cfg: 1.00 cfg_scale: 1.00 guidance: 3.50 clip_skip: -1 width: 1024 height: 1024 sample_method: euler schedule: discrete sample_steps: 16 strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 System Info: BLAS = 1 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:175 - Using SYCL backend [SYCL] call ggml_check_sycl ggml_check_sycl: GGML_SYCL_DEBUG: 0 ggml_check_sycl: GGML_SYCL_F16: no ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1 found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | | |ID| Device Type| Name|Version|units |group |group|size | Driver version| |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| | 0| [level_zero:gpu:0]| Intel Arc A770 Graphics| 1.5| 512| 1024| 32| 16225M| 1.3.30872| ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no ggml_sycl_init: SYCL_USE_XMX: yes ggml_sycl_init: found 1 SYCL devices: [WARN ] stable-diffusion.cpp:185 - Flash Attention not supported with GPU Backend [INFO ] stable-diffusion.cpp:202 - loading clip_l from ' /home/clip/clip_l.safetensors' [INFO ] model.cpp:793 - load /home/clip/clip_l.safetensors using safetensors format [DEBUG] model.cpp:861 - init from ' /home/clip/clip_l.safetensors' [INFO ] stable-diffusion.cpp:209 - loading t5xxl from ' /home/clip/clip_t5xxl_fp16.safetensors' [INFO ] model.cpp:793 - load /home/clip/clip_t5xxl_fp16.safetensors using safetensors format [DEBUG] model.cpp:861 - init from ' /home/clip/clip_t5xxl_fp16.safetensors' [INFO ] stable-diffusion.cpp:216 - loading diffusion model from ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf' [INFO ] model.cpp:790 - load /home/unet/pixelwave_flux1Dev02_Q8_0.gguf using gguf format [DEBUG] model.cpp:807 - init from ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf' WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc! [INFO ] stable-diffusion.cpp:223 - loading vae from ' /home/vae/flux_vae.safetensors' [INFO ] model.cpp:793 - load /home/vae/flux_vae.safetensors using safetensors format [DEBUG] model.cpp:861 - init from ' /home/vae/flux_vae.safetensors' [INFO ] stable-diffusion.cpp:235 - Version: Flux Dev [INFO ] stable-diffusion.cpp:266 - Weight type: f16 [INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q8_0 [INFO ] stable-diffusion.cpp:269 - VAE weight type: f32 [DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes [INFO ] stable-diffusion.cpp:310 - set clip_on_cpu to true [INFO ] stable-diffusion.cpp:313 - CLIP: Using CPU backend [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors) [DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size = 12068.09 MB(VRAM) (780 tensors) [INFO ] stable-diffusion.cpp:334 - VAE Autoencoder: Using CPU backend [DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:398 - loading weights [DEBUG] model.cpp:1530 - loading tensors from /home/clip/clip_l.safetensors [DEBUG] model.cpp:1530 - loading tensors from /home/clip/clip_t5xxl_fp16.safetensors [INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file [DEBUG] model.cpp:1530 - loading tensors from /home/unet/pixelwave_flux1Dev02_Q8_0.gguf [DEBUG] model.cpp:1530 - loading tensors from /home/vae/flux_vae.safetensors [INFO ] stable-diffusion.cpp:497 - total params memory size = 21481.50MB (VRAM 12068.09MB, RAM 9413.41MB): clip 9318.83MB(RAM), unet 12068.09MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 6.08s [INFO ] stable-diffusion.cpp:518 - running in Flux FLOW mode [INFO ] stable-diffusion.cpp:534 - running with discrete schedule [DEBUG] stable-diffusion.cpp:572 - finished loaded file [DEBUG] stable-diffusion.cpp:1378 - txt2img 1024x1024 [DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within" [INFO ] stable-diffusion.cpp:655 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.00s [DEBUG] conditioner.hpp:1036 - parse 'The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within' to [['The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] t5.hpp:397 - token length: 256 [DEBUG] ggml_extend.hpp:1001 - t5 compute buffer size: 68.25 MB(RAM) [DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 6029 ms [INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 6030 ms [INFO ] stable-diffusion.cpp:1279 - sampling using Euler method [INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:1001 - flux compute buffer size: 2577.25 MB(VRAM) |===> | 1/16 - 6.89s/it |======> | 2/16 - 6.44s/it |=========> | 3/16 - 6.48s/it |============> | 4/16 - 6.55s/it |===============> | 5/16 - 6.48s/it |==================> | 6/16 - 6.43s/it |=====================> | 7/16 - 6.41s/it |=========================> | 8/16 - 6.44s/it |============================> | 9/16 - 6.49s/it |===============================> | 10/16 - 6.53s/it |==================================> | 11/16 - 6.52s/it |=====================================> | 12/16 - 6.52s/it |========================================> | 13/16 - 6.58s/it |===========================================> | 14/16 - 6.45s/it |==============================================> | 15/16 - 6.48s/it |==================================================| 16/16 - 6.42s/it [INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 104.18s [INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 104.20s [INFO ] stable-diffusion.cpp:1326 - decoding 1 latents [DEBUG] ggml_extend.hpp:1001 - vae compute buffer size: 6656.00 MB(RAM) [DEBUG] stable-diffusion.cpp:987 - computing vae [mode: DECODE] graph completed, taking 46.52s [INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 46.52s [INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 46.52s [INFO ] stable-diffusion.cpp:1449 - txt2img completed in 156.75s save result image to '/tmp/output.png' ```

and in the end i am getting this kind of gibberish image

result image ![output](https://github.com/user-attachments/assets/98e21551-4cb5-426b-ac3e-8a850786a711)

How to fix this image generation process? Or i need to change something in conversion? If i trying to generate with ComfyUI the result is same.

cb88 commented 2 months ago

Not sure this is the issue you are running into but for instance flux apparently needs the shift value set otherwise it gets blurry... which it appears S-D.cpp does not have a parameter for currently. It might make sense to have this as a both a parameter to set custom shift as well as a flag to auto set shift for flux models that need it, if you have both turned the custom parameter could increase or decrease the auto calculated shift value.

function calcShift(h, w) { const step1 = (h w) / 256; const step2 = (1.15 - 0.5) / (4096 - 256); const step3 = (step1 - 256) step2 + 0.5; const result = Math.exp(step3); return Math.round(result * 100) / 100; }

https://www.reddit.com/r/drawthingsapp/comments/1erjvur/flux1_dev_8bit_generation_is_blurry/

Example blurry output at 30 steps with q8 flux dev: image

cb88 commented 2 months ago

Note I did some more testing and get the same noise as you on Radeon W7800 with vulkan (wasn't able to load HIP it says rocblas is missing) , the CPU implementation works fine with the same models though.

.\sd.exe -t 8 -v --cfg-scale 1 --rng std_default --vae-tiling --diffusion-model ..\models\Flux\flux1-schnell-Q8_0.gguf --clip_l ..\models\Flux\clip_l.safetensors --vae ..\models\Flux\ae.safetensors --t5xxl ..\models\Flux\t5xxl_fp16.safetensors -H 640 -W 448 --steps 1 -p "a corgi dog sitting on a mossy spot in a lush forest" -b 1 -o corgi.png Option: n_threads: 8 mode: txt2img model_path: wtype: unspecified clip_l_path: ..\models\Flux\clip_l.safetensors t5xxl_path: ..\models\Flux\t5xxl_fp16.safetensors diffusion_model_path: ..\models\Flux\flux1-schnell-Q8_0.gguf vae_path: ..\models\Flux\ae.safetensors taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: corgi.png init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:false strength(control): 0.90 prompt: a corgi dog sitting on a mossy spot in a lush forest negative_prompt: min_cfg: 1.00 cfg_scale: 1.00 guidance: 3.50 clip_skip: -1 width: 448 height: 640 sample_method: euler_a schedule: default sample_steps: 1 strength(img2img): 0.75 rng: std_default seed: 42 batch_count: 1 vae_tiling: true upscale_repeats: 1 System Info: BLAS = 1 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:166 - Using Vulkan backend ggml_vulkan: Found 1 Vulkan devices: Vulkan0: AMD Radeon PRO W7800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 [INFO ] stable-diffusion.cpp:202 - loading clip_l from '..\models\Flux\clip_l.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\clip_l.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\clip_l.safetensors' [INFO ] stable-diffusion.cpp:209 - loading t5xxl from '..\models\Flux\t5xxl_fp16.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\t5xxl_fp16.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\t5xxl_fp16.safetensors' [INFO ] stable-diffusion.cpp:216 - loading diffusion model from '..\models\Flux\flux1-schnell-Q8_0.gguf' [INFO ] model.cpp:790 - load ..\models\Flux\flux1-schnell-Q8_0.gguf using gguf format [DEBUG] model.cpp:807 - init from '..\models\Flux\flux1-schnell-Q8_0.gguf' [INFO ] stable-diffusion.cpp:223 - loading vae from '..\models\Flux\ae.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\ae.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\ae.safetensors' [INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell [INFO ] stable-diffusion.cpp:266 - Weight type: f16 [INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q8_0 [INFO ] stable-diffusion.cpp:269 - VAE weight type: f32 [DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes [INFO ] stable-diffusion.cpp:310 - set clip_on_cpu to true [INFO ] stable-diffusion.cpp:313 - CLIP: Using CPU backend [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors) [DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size = 12057.71 MB(VRAM) (776 tensors) [DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors) [DEBUG] stable-diffusion.cpp:398 - loading weights [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\clip_l.safetensors [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\t5xxl_fp16.safetensors [INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\flux1-schnell-Q8_0.gguf [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\ae.safetensors [INFO ] stable-diffusion.cpp:497 - total params memory size = 21471.11MB (VRAM 12152.28MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12057.71MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 12.72s/

cb88 commented 2 months ago

Note my Vega FE with Vulkan works fine.... so maybe some driver issues?

cb88 commented 1 month ago

I tried hard coding corrected shift values to pass to the denoiser but did not get the expected improvement.