Closed sohzm closed 1 month ago
Related issue: https://github.com/ggerganov/llama.cpp/issues/5356
(im new to this, so I might have made some mistakes. I would be grateful for any guidance or feedback)
Hey, nice to see someone working on this. I'd like to get this to work. There's probably some ops that need to be supported by Vulkan upstream, right? I can help with that.
@0cc4m Thanks for offering help.
Currently the hpp file generated by ggml_vk_generate_shaders.py
does not have types like mul_mat_vec_id_q3_k_f32_len
, div_f32_len
etc
Also some types were renamed eg: dequant_q5_k_len
is imported in ggml/src/ggml-vulkan.cpp
but header file has dequant_q5_K_len
Im assuming these issues will be solved by your work in llama.cpp? please correct me if Im wrong
Also let me know if I can help with anything
@0cc4m Thanks for offering help.
Currently the hpp file generated by
ggml_vk_generate_shaders.py
does not have types likemul_mat_vec_id_q3_k_f32_len
,div_f32_len
etcAlso some types were renamed eg:
dequant_q5_k_len
is imported inggml/src/ggml-vulkan.cpp
but header file hasdequant_q5_K_len
Im assuming these issues will be solved by your work in llama.cpp? please correct me if Im wrong
Also let me know if I can help with anything
It is working in Llama.cpp. I'll take a look at the status in ggml, maybe that needs an update.
I manually wired up Vulkan and compiled SD.cpp with the latest ggml modified with llama.cpp's modifications to Vulkan. It runs and loads a model, but their Vulkan shaders do not implement CONCAT and it fails.
./sd -m ~/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors --prompt "score_9, score_8_up, score_7_up, score_6_up, score_5_up, rainbow dash" -W 1024 -H 1024 -v
Option:
n_threads: 8
mode: txt2img
model_path: /home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors
wtype: unspecified
vae_path:
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normzalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
strength(control): 0.90
prompt: score_9, score_8_up, score_7_up, score_6_up, score_5_up, rainbow dash
negative_prompt:
min_cfg: 1.00
cfg_scale: 7.00
clip_skip: -1
width: 1024
height: 1024
sample_method: euler_a
schedule: default
sample_steps: 20
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 1
AVX512_VNNI = 1
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:158 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA RTX A4000 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[INFO ] stable-diffusion.cpp:178 - loading model from '/home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors'
[INFO ] model.cpp:737 - load /home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors using safetensors format
[DEBUG] model.cpp:803 - init from '/home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors'
[INFO ] stable-diffusion.cpp:201 - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:207 - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:208 - ggml tensor size = 400 bytes
[WARN ] stable-diffusion.cpp:213 - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[DEBUG] ggml_extend.hpp:884 - clip params backend buffer size = 1564.36 MB(VRAM) (713 tensors)
[DEBUG] ggml_extend.hpp:884 - unet params backend buffer size = 4900.07 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:884 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:309 - loading vocab
[DEBUG] clip.hpp:164 - vocab size: 49408
[DEBUG] clip.hpp:175 - trigger word img already in vocab
[DEBUG] stable-diffusion.cpp:329 - loading weights
[DEBUG] model.cpp:1380 - loading tensors from /home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors
[INFO ] stable-diffusion.cpp:413 - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:432 - loading model from '/home/david/Desktop/Misc/stable_diffusion/a1111/models/Stable-diffusion/ponyDiffusionV6XL_v6.safetensors' completed, taking 4.34s
[INFO ] stable-diffusion.cpp:449 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:482 - finished loaded file
[DEBUG] stable-diffusion.cpp:1452 - txt2img 1024x1024
[DEBUG] stable-diffusion.cpp:1207 - prompt after extract and remove lora: "score_9, score_8_up, score_7_up, score_6_up, score_5_up, rainbow dash"
[INFO ] stable-diffusion.cpp:565 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1212 - apply_loras completed, taking 0.00s
[DEBUG] clip.hpp:1312 - parse 'score_9, score_8_up, score_7_up, score_6_up, score_5_up, rainbow dash' to [['score_9, score_8_up, score_7_up, score_6_up, score_5_up, rainbow dash', 1], ]
[DEBUG] clip.hpp:1152 - token length: 77
[DEBUG] ggml_extend.hpp:838 - clip compute buffer size: 2.56 MB(VRAM)
ggml_vulkan: Error: Missing op: CONCAT
GGML_ASSERT: /home/david/Desktop/Dev/ggml/stable-diffusion.cpp/ggml/src/ggml-vulkan.cpp:5533: false
Aborted (core dumped)
After adding CONCAT to the relevant place (probably not the solution for that?), it makes it a little further but still fails here:
ggml_backend_vk_graph_compute: error: op not supported (view) (UNARY)
GGML_ASSERT: /home/david/Desktop/Dev/ggml/stable-diffusion.cpp/ggml/src/ggml-vulkan.cpp:6227: ok
At this point it's beyond my knowledge/skill.
@Cloudwalk9 Thank you for trying it, I can add the missing ops. Can you upload your progress to a branch that I can access?
@0cc4m Done, but it's pretty crude. I updated the submodule to point to my fork of ggml with the imported Vulkan stuff, also had to fix some headers. https://github.com/Cloudwalk9/stable-diffusion.cpp
@0cc4m They just synced the newer Vulkan shader code (split into individual files) from llama.cpp to upstream ggml, so you could probably target ggml directly, instead of my forked submodule.
@0cc4m They just synced the newer Vulkan shader code (split into individual files) from llama.cpp to upstream ggml, so you could probably target ggml directly, instead of my forked submodule.
Yeah, my WIP branch is here: https://github.com/0cc4m/ggml/tree/vulkan-stable-diffusion-ops
I implemented all the ops, but there's still some bug that makes the image not adhere to the prompt. I'll investigate that later.
@0cc4m They just synced the newer Vulkan shader code (split into individual files) from llama.cpp to upstream ggml, so you could probably target ggml directly, instead of my forked submodule.
Yeah, my WIP branch is here: https://github.com/0cc4m/ggml/tree/vulkan-stable-diffusion-ops
I implemented all the ops, but there's still some bug that makes the image not adhere to the prompt. I'll investigate that later.
Great work, thank you!
Some ops appear to still be missing when I try to use LoRA (res-adapter):
lora.hpp:67 - finished loaded lora`
lora.hpp:175 - (18 / 18) LoRA tensors applied successfully
ggml_extend.hpp:841 - lora compute buffer size: 112.85 MB(VRAM)
lora.hpp:175 - (18 / 18) LoRA tensors applied successfully
ggml_vulkan: Error: Missing op: ADD for f16 and f32 to f16
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:4149: fatal error
A different error occurs when I try to use TAESD:
stable-diffusion.cpp:1398 - generating 1 latent images completed, taking 46.07s
stable-diffusion.cpp:1401 - decoding 1 latents
ggml_extend.hpp:841 - taesd compute buffer size: 480.00 MB(VRAM)
ggml_backend_vk_graph_compute: error: op not supported (view) (UNARY)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:6432: GGML_ASSERT(ok) failed
We're finally about to see Stable Diffusion where the only major dependency is your graphics driver...
@SkutteOleg Thank you, those should be easy to add. I fixed the first bug that caused issues, but I ran into another matmul bug that I have to find in the shader code. I hope I can find it soon.
LORA and TAESD should work now. I also fixed the matmul bug. It's generating images correctly in my tests, but not that fast yet.
LORA and TAESD should work now. I also fixed the matmul bug. It's generating images correctly in my tests, but not that fast yet.
It is amazing, actually. It's 2.5 times faster than CUDA12 on my end 😲 (perhaps due to lower memory usage, but i'm not sure)
LORA and TAESD should work now. I also fixed the matmul bug. It's generating images correctly in my tests, but not that fast yet.
It is amazing, actually. It's 2.5 times faster than CUDA12 on my end 😲 (perhaps due to lower memory usage, but i'm not sure)
On which hardware?
On which hardware?
NVIDIA GeForce GTX 1660 SUPER
EDIT: Also confirmed working reasonably fast on Steam Deck.
It's 2.5 times faster than CUDA12 on my end 😲 (perhaps due to lower memory usage, but i'm not sure)
I had time to do some further testing. Apparently I was comparing the speed to a previous build of sd.cpp. It turns out CUDA12 image generation speed also got faster after ggml update. Even still, Vulkan is 20% faster. However, I was wrong about memory. It appears that Vulkan uses more memory as I can no longer fit both llama.cpp and stable-diffusion.cpp on the GPU at the same time.
UPD: I was testing at 512x512 before. When trying 1024x1024 Vulkan is indeed 15% slower for me. Also, at 1024x1024 it produces broken outputs on my hardware:
LORA and TAESD should work now. I also fixed the matmul bug. It's generating images correctly in my tests, but not that fast yet.
Excellent work, for me works fine, tested with intel ARC a580
UPD: I was testing at 512x512 before. When trying 1024x1024 Vulkan is indeed 15% slower for me. Also, at 1024x1024 it produces broken outputs on my hardware.
This is a problem with a very large buffer that sd.cpp requests for VAE decoding (?). I cannot fix that on the Vulkan side, but I am throwing an exception now so that it crashes instead of just generating garbage output. Maybe @leejet can think of a solution? Vulkan has a restriction on how large VRAM buffers can be (usually 4GB), and 1024x1024 VAE decoding requests a buffer larger than that.
there should be VAE-tiling available, or fallback to cpu (not exposed as a cli option afaik).
This is a problem with a very large buffer that sd.cpp requests for VAE decoding (?). I cannot fix that on the Vulkan side, but I am throwing an exception now so that it crashes instead of just generating garbage output. Maybe @leejet can think of a solution? Vulkan has a restriction on how large VRAM buffers can be (usually 4GB), and 1024x1024 VAE decoding requests a buffer larger than that.
Shouldn't VAE tiling help with that? This occurs for me even with VAE tiling enabled.
Excellent work, well done. Pictures are generated at 384x384 on my Intel i5-1035G1.
Using the --vae-on-cpu option it will do 512x512 images. Don't understand why VAE should be such a problem, the compute buffer size is 1.6GB in ram.
Tried the vulkan repo from Skuttle, vulkan sdcpp -> 2.12 it/s cuda sdcpp -> 3.95 it/s comfyui -> 1.27 it/s
Nvidia gtx 1650 ti mobile Fedora 40
nearly identical images, though why are some patches different b/w cuda and vulkan?
This is a problem with a very large buffer that sd.cpp requests for VAE decoding (?). I cannot fix that on the Vulkan side, but I am throwing an exception now so that it crashes instead of just generating garbage output. Maybe @leejet can think of a solution? Vulkan has a restriction on how large VRAM buffers can be (usually 4GB), and 1024x1024 VAE decoding requests a buffer larger than that.
Shouldn't VAE tiling help with that? This occurs for me even with VAE tiling enabled.
It should, and it does in my tests. I can generate 1024x1024 images with SDXL by using --vae-tiling
or --vae-on-cpu
.
why are some patches different b/w cuda and vulkan?
There are slight differences in how the CUDA and Vulkan backends calculate, for example the CUDA backend uses tensor cores for matrix multiplication, while the Vulkan backend (on Nvidia GPUs) uses the regular CUDA cores. That can change the results slightly. There might also be some minor differences in other operations that contribute to that, too.
I tried the img2img mode but it immediately raises an error ggml_vulkan: Error: Missing op: PAD
I tried the img2img mode but it immediately raises an error ggml_vulkan: Error: Missing op: PAD
Thank you for reporting that, I forgot to check img2img. It should work now.
When trying to load any embedding I get this missing vulkan operator:
ggml_vulkan: Error: Missing op: CONCAT for f16 and f16 to f16
When trying to load any embedding I get this missing vulkan operator:
ggml_vulkan: Error: Missing op: CONCAT for f16 and f16 to f16
I can implement that, but it's odd considering that f16 CONCAT is not even implemented for CPU or CUDA. Do embeddings work with those?
At least with cpu it doesn't work, I'm unable to test cuda
At least with cpu it doesn't work, I'm unable to test cuda
You can try to build with this branch: https://github.com/0cc4m/ggml/tree/vulkan-stable-diffusion-ops-concat-f16
I've cherry-picked the changes from your branch and the embeddings now work fine, I'm not sure why on cpu the program crashed.
1.5 it/s on an SD 1.5 model at 512x512 Euler A 20 steps. The performance drops quadratically with resolution. I get 7 it/s on ComfyUI with same settings.
BUT, CUDA backend here is 3 it/s on the same settings and performance also tanks with resolution in the same manner. RTX A4000 Mobile, which is about equivalent to a desktop 3060 Ti.
There's still a LOT of room for optimization.
Also I found a bug I intend to submit here and upstream. I encountered one of my model weights interpreted as "F64"(?!) that works in standard web UIs but not ggml/sd.cpp, regardless of --type setting.
@0cc4m , sorry to bother you again but I've found another missing op, this time related to upscaling:
upscaler.cpp:49 - upscaling from (512 x 512) to (2048 x 2048)
Errors: ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | warp size: 64
ggml_vulkan: Error: Missing op: LEAKY_RELU
sorry to bother you again but I've found another missing op, this time related to upscaling:
No problem, I added LEAKY_RELU. Please try it.
No problem, I added LEAKY_RELU. Please try it.
Upscaling works fine now, however I've also just found out that quantized models don't seem to work on my pc:
Vulkan0: AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | warp size: 64
/home/daniandtheweb/Applications/stable-diffusion/test/stable-diffusion.cpp/ggml/src/ggml-vulkan.cpp:4145: GGML_ASSERT(op == GGML_OP_GET_ROWS || (!ggml_is_quantized(src0->type) && (src1 == nullptr || !ggml_is_quantized(src1->type)))) failed
ptrace: Operation not permitted.
No stack.
The program is not being run.
The quantized models don't work only when an embedding or a lora are used, without any of them the program runs just fine.
No problem, I added LEAKY_RELU. Please try it.
Upscaling works fine now, however I've also just found out that quantized models don't seem to work on my pc:
Vulkan0: AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | warp size: 64 /home/daniandtheweb/Applications/stable-diffusion/test/stable-diffusion.cpp/ggml/src/ggml-vulkan.cpp:4145: GGML_ASSERT(op == GGML_OP_GET_ROWS || (!ggml_is_quantized(src0->type) && (src1 == nullptr || !ggml_is_quantized(src1->type)))) failed ptrace: Operation not permitted. No stack. The program is not being run.
The quantized models don't work only when an embedding or a lora are used, without any of them the program runs just fine.
Yeah, I think embedding and lora only working on non-quantized models is expected. Does that work on CPU/CUDA?
On cpu lora + quantized model works (I can't test embeddings as for some reason I can't get them to work at all on cpu)
https://github.com/ggerganov/ggml/pull/904
the required ops where merged upstream
i got a error when using a fp8 model
ggml/src/ggml-vulkan.cpp:2111: GGML_ASSERT(idx < vk_instance.device_indices.size()) failed
That's a generic error that means the vulkan device you requested doesn't exist. If you didn't request one manually (with the environment variable GGML_VK_VISIBLE_DEVICES
) then it means you don't have a Vulkan device. Missing driver maybe? Try running vulkaninfo --summary
to check what devices are available.
i got a error when using a fp8 model
ggml/src/ggml-vulkan.cpp:2111: GGML_ASSERT(idx < vk_instance.device_indices.size()) failed
That's a generic error that means the vulkan device you requested doesn't exist. If you didn't request one manually (with the environment variable
GGML_VK_VISIBLE_DEVICES
) then it means you don't have a Vulkan device. Missing driver maybe? Try runningvulkaninfo --summary
to check what devices are available.
sorry for the confusion. i forgot to pick https://github.com/Cloudwalk9/stable-diffusion.cpp/commit/a4f071a3188f6fc967ab3180235f2096fe1a02d8 . after apply that i got a error from sd.cpp itself (https://github.com/leejet/stable-diffusion.cpp/issues/329#issuecomment-2271714386). which is not the error of ggml vulkan backend
Attempting to compile this, getting quite a lot of errors all of a similar type:
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/source/ggml/src/ggml-vulkan.cpp:1108:109: error: ‘matmul_f32_f16_aligned_len’ was not declared in this scope; did you mean ‘matmul_f32_aligned_len’?
1108 | ggml_vk_create_pipeline(ctx, ctx->device->pipeline_matmul_f32_f16->a_l, "matmul_f32_f16_aligned_l", matmul_f32_f16_aligned_len, matmul_f32_f16_aligned_data, "main", 3, sizeof(vk_mat_mat_push_constants), l_wg_denoms, warptile_l, l_align);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
| matmul_f32_aligned_len
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/source/ggml/src/ggml-vulkan.cpp:1108:137: error: ‘matmul_f32_f16_aligned_data’ was not declared in this scope; did you mean ‘matmul_f32_aligned_data’?
1108 | ggml_vk_create_pipeline(ctx, ctx->device->pipeline_matmul_f32_f16->a_l, "matmul_f32_f16_aligned_l", matmul_f32_f16_aligned_len, matmul_f32_f16_aligned_data, "main", 3, sizeof(vk_mat_mat_push_constants), l_wg_denoms, warptile_l, l_align);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
| matmul_f32_aligned_data
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/source/ggml/src/ggml-vulkan.cpp:1196:97: error: ‘matmul_id_f32_f32_len’ was not declared in this scope; did you mean ‘matmul_f32_fp32_len’?
1196 | ggml_vk_create_pipeline(ctx, ctx->device->pipeline_matmul_id_f32->l, "matmul_id_f32_l", matmul_id_f32_f32_len, matmul_id_f32_f32_data, "main", 4, sizeof(vk_mat_mat_id_push_constants), l_wg_denoms, warptile_l, 1);
| ^~~~~~~~~~~~~~~~~~~~~
| matmul_f32_fp32_len
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/source/ggml/src/ggml-vulkan.cpp:1196:120: error: ‘matmul_id_f32_f32_data’ was not declared in this scope; did you mean ‘matmul_f32_fp32_data’?
1196 | ggml_vk_create_pipeline(ctx, ctx->device->pipeline_matmul_id_f32->l, "matmul_id_f32_l", matmul_id_f32_f32_len, matmul_id_f32_f32_data, "main", 4, sizeof(vk_mat_mat_id_push_constants), l_wg_denoms, warptile_l, 1);
| ^~~~~~~~~~~~~~~~~~~~~~
| matmul_f32_fp32_data
I did run the python script to generate shader headers, so those are in ggml's src/
directory, as expected. Perhaps an issue with the script not generating the right shaders for my device? I'm using an Nvidia GTX 1050 Ti with the proprietary drivers (I hope to try this with Nouveau using NVK at a later date as well).
Any help to get this working would be appreciated.
@msglm That looks like the ggml-version you're using is outdated. You have to use an up-to-date one that contains the Vulkan operators I've implemented to allow running stable diffusion. One part of that is that the Python script to generate shaders has been deleted and replaced with a C++ program that is automatically called by Make/CMake as part of the build process.
@0cc4m Seems that your suggestion didn't fix the core of the issue, since it's still not able to properly generate (seemingly, for similar reasons). The problem seemingly lies in the actual generation of the shaders.
The failed build log in full: guix-build-vanilla.log
The version of GLSLC I have on my machine is 2024.01.3.280.01.3.280.0 (Target: SPIR-V 1.0) (since that's the tool doing the generating and heavy lifting). I decided to change FILE* spv = fopen(path.c_str(), "rb");
to FILE* spv = fopen(path.c_str(), "wb+");
in vulkan-shaders-gen.cpp to see if I could force the files to be generated if they don't exist, but The result was the following warnings:
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:660:41: warning: ISO C++ forbids zero-size array ‘get_rows_q4_1_data’ [-Wpedantic]
660 | extern unsigned char get_rows_q4_1_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:663:36: warning: ISO C++ forbids zero-size array ‘gelu_f32_data’ [-Wpedantic]
663 | extern unsigned char gelu_f32_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:666:41: warning: ISO C++ forbids zero-size array ‘get_rows_q5_0_data’ [-Wpedantic]
666 | extern unsigned char get_rows_q5_0_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:669:41: warning: ISO C++ forbids zero-size array ‘get_rows_q5_1_data’ [-Wpedantic]
669 | extern unsigned char get_rows_q5_1_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:672:39: warning: ISO C++ forbids zero-size array ‘cpy_f16_f16_data’ [-Wpedantic]
672 | extern unsigned char cpy_f16_f16_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:675:54: warning: ISO C++ forbids zero-size array ‘matmul_id_q4_0_f32_aligned_data’ [-Wpedantic]
675 | extern unsigned char matmul_id_q4_0_f32_aligned_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:678:50: warning: ISO C++ forbids zero-size array ‘mul_mat_vec_nc_f16_f32_data’ [-Wpedantic]
678 | extern unsigned char mul_mat_vec_nc_f16_f32_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:681:51: warning: ISO C++ forbids zero-size array ‘matmul_id_q4_1_f32_fp32_data’ [-Wpedantic]
681 | extern unsigned char matmul_id_q4_1_f32_fp32_data[0];
| ^
/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp:684:46: warning: ISO C++ forbids zero-size array ‘matmul_id_q5_0_f32_data’ [-Wpedantic]
684 | extern unsigned char matmul_id_q5_0_f32_data[0];
These warnings make me believe that vulkan-shaders-gen.cpp is just improperly generating the shaders on my machine (as they're zero-sized arrays).
Full build log for the attempt with removed error-checking can be found here: guix-build-modded.log
EDIT: Ignore everything after "I decided to change". I forgot that rb was for reading, I thought it was for writing.
These are problems arising from the package manager you are using, which I am unfamiliar with. You'd have to look into why vulkan-shaders-gen
fails to compile the SPIR-V files correctly (issue with glslc
?) or in the correct location on your system:
[ 15%] Built target vulkan-shaders-gen
make -f ggml/src/CMakeFiles/ggml.dir/build.make ggml/src/CMakeFiles/ggml.dir/depend
make[2]: Entering directory '/tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build'
[ 21%] Generate vulkan shaders
cd /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src && ../../bin/vulkan-shaders-gen --glslc /gnu/store/hkkwk7dhcsfy9r43zqi82yi3blhdb4zf-shaderc-2024.0/bin/glslc --input-dir /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/source/ggml/src/vulkan-shaders --output-dir /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/vulkan-shaders.spv --target-hpp /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.hpp --target-cpp /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/ggml-vulkan-shaders.cpp --no-clean
ggml_vulkan: Generating and compiling shaders to SPIR-V
Error opening SPIR-V file: /tmp/guix-build-stable-diffusion-cpp-vulkan-15553084e5d1a7d6abd3aac176bee95b0247c379.drv-0/build/ggml/src/vulkan-shaders.spv/matmul_f32_f16_fp32.spv (No such file or directory)
I see also that this branch is out of date. @sohzm Do you want to update this (@SkutteOleg has opened a PR in your fork)? Otherwise this should be closed and a new PR should be opened (maybe by @SkutteOleg )
Thanks @0cc4m for pointing out the pr, I had missed it.
Ive merged it and also added changes to make it work with the recent ggml commit https://github.com/ggerganov/ggml/commit/fc31d407910e0ebeee5f1c0f65cada2ad211d1da in this commit: https://github.com/leejet/stable-diffusion.cpp/pull/291/commits/41ca4d509d005213b6be95ebd46f6b9bee90263b
Im fixing the merge conflicts to get this pr ready but can you confirm if the changes related to the (changes were in the master branch, I didnt lookup earlier, sorry to bother you)epsilon as a parameter for group_norm
are correct as I dont know what its about.
Ive fixed the pr, and I think it can be merged now?
I was able to fix my issue with my package manager by editing gguf's vulkan-shaders-gen.cpp to use popen(command.c_str(), "r");
rather than execl("/bin/sh", "sh", "-c", command.c_str(), (char*) nullptr);
to start glslc for compiling the shaders. It now builds, recognizing my GPU, and runs properly. There may be no need to upstream this fix since Guix supports having custom patches for building projects (its a source-based package manager), though popen may be preferred since it supports writing command output to stdout (useful for debugging errors related to glslc).
Example of generation:
/gnu/store/nb7a2af09zayajxkj3mx2d4mpyp0bfhh-profile/bin/sd -m "./v1-5-pruned-emaonly.safetensors" --vae ./sdxl_vae.safetensors -p "Spanish Monarch Painting, queen, elegant" -o /tmp/queen.png
The vulkan device used was: Vulkan0: NVIDIA GeForce GTX 1050 Ti (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
The generation is at a decent speed: 3 iterations a second for 512x512 and 19 iterations a second for 1024x1024 (used PonyDiffusion for this with a type of q5_0).
Stable Diffusion v2 is very broken on my machine and created this mess using the following command:
/gnu/store/nb7a2af09zayajxkj3mx2d4mpyp0bfhh-profile/bin/sd -m ./v2-1_768-ema-pruned.ckpt -p "Spanish Monarch Painting, queen, elegant" -o /tmp/queen.png
Furthermore, img2img is broken due to using way too much memory on my machine.
Still, this PR is looking promising for bringing speedy image generation to most GPUs.
@leejet could you please review and merge this PR? If there's anything you'd like me to change or improve, please let me know. Thanks!
issue: https://github.com/leejet/stable-diffusion.cpp/issues/256
Looks like theyre doing some changes to vulkan shader generation in ggml repo, and its currently broken. Will keep and eye on it and update the pr accordingly.