Open recallmenot opened 1 month ago
Could you run it inside gdb and give us the stacktrace?
gdb --args stable-diffusion.cpp/build/bin/sd --diffusion-model stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf --vae stable-diffusion.cpp/models/ae.safetensors --clip_l stable-diffusion.cpp/models/clip_l.safetensors --t5xxl stable-diffusion.cpp/models/t5xxl_fp16.safetensors --steps 4 -p "a lovely cat" --cfg-scale 1.0 --sampling-method euler -v
and hit r+enter.
You might also try pulling a more up-to-date ggml. There are always advancements and fixes happening upstream.
Sadly debuginfod did not retrieve the symbols for libze_intel_gpu.so.1:
Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00007fffba8b7240 in ?? () from /usr/lib/libze_intel_gpu.so.1
@(gdb) bt
#0 0x00007fffba8b7240 in ?? () from /usr/lib/libze_intel_gpu.so.1
#1 0x00007fffba8b7f46 in ?? () from /usr/lib/libze_intel_gpu.so.1
#2 0x00007fffba86c3a2 in ?? () from /usr/lib/libze_intel_gpu.so.1
#3 0x00007fffba508af1 in ?? () from /usr/lib/libze_intel_gpu.so.1
#4 0x00007fffba50af69 in ?? () from /usr/lib/libze_intel_gpu.so.1
#5 0x00007fffba4e8962 in ?? () from /usr/lib/libze_intel_gpu.so.1
#6 0x00007ffff75930b6 in urQueueFinish () from /opt/intel/oneapi/compiler/2024.1/lib/libpi_level_zero.so
#7 0x00007ffff759c74b in piQueueFinish () from /opt/intel/oneapi/compiler/2024.1/lib/libpi_level_zero.so
#8 0x00007fffe47185b7 in _pi_result sycl::_V1::detail::plugin::call_nocheck<(sycl::_V1::detail::PiApiKind)23, _pi_queue*>(_pi_queue*) const ()
from /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7
#9 0x00007fffe48d043f in sycl::_V1::detail::queue_impl::wait(sycl::_V1::detail::code_location const&) () from /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7
#10 0x000000000062747a in ggml_backend_sycl_synchronize(ggml_backend*) ()
#11 0x00000000005ab047 in ggml_backend_graph_compute ()
#12 0x00000000004b5295 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#13 0x00000000004f4cf5 in FluxModel::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, float, ggml_tensor**, ggml_context*) ()
#14 0x000000000051a171 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const ()
#15 0x000000000049d657 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition) ()
#16 0x000000000047c9f9 in generate_image(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, float, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#17 0x000000000047e89f in txt2img ()
#18 0x00000000004212f6 in main ()
stable-diffusion.cpp was cloned with --recursive, head of ggml was detached at 21d3a30 (11 days) despite fresh git pull
on stable-diffusion.cpp, after
cd ggml
git pull origin master
it was fast-forwarded to the current fbac47b
This resulted in build errors due to disabled GGML_MAX_N_THREADS
of ggml c584042 in include/ggml.h
.
Moving the #define GGML_MAX_N_THREADS 512
outside the #ifndef GGML_MAX_NAME
let me build again.
now gdb returns
[INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell
[INFO ] stable-diffusion.cpp:266 - Weight type: tq1_0
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: tq1_0
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: tq1_0
[INFO ] stable-diffusion.cpp:269 - VAE weight type: tq1_0
[DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:310 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:313 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
sd: /home/recallmenot/stable-diffusion.cpp/ggml/src/ggml.c:3375: size_t ggml_row_size(enum ggml_type, int64_t): Assertion `ne % ggml_blck_size(type) == 0' failed.
Thread 1 "sd" received signal SIGABRT, Aborted.
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
@(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007ffff7cf6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2 0x00007ffff7c9d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7c844c3 in __GI_abort () at abort.c:79
#4 0x00007ffff7c843df in __assert_fail_base (fmt=0x7ffff7e14c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x6cca97 "ne % ggml_blck_size(type) == 0", file=file@entry=0x6cc702 "/home/a/stable-diffusion.cpp/ggml/src/ggml.c",
line=line@entry=3375, function=function@entry=0x6ccab6 "size_t ggml_row_size(enum ggml_type, int64_t)") at assert.c:94
#5 0x00007ffff7c95177 in __assert_fail (assertion=0x6cca97 "ne % ggml_blck_size(type) == 0", file=0x6cc702 "/home/a/stable-diffusion.cpp/ggml/src/ggml.c", line=3375,
function=0x6ccab6 "size_t ggml_row_size(enum ggml_type, int64_t)") at assert.c:103
#6 0x000000000054fddd in ggml_new_tensor_impl ()
#7 0x00000000005502f0 in ggml_new_tensor_2d ()
#8 0x00000000004dbe04 in Embedding::init_params(ggml_context*, ggml_type) ()
#9 0x00000000004a375f in GGMLBlock::init(ggml_context*, ggml_type) ()
#10 0x00000000004a375f in GGMLBlock::init(ggml_context*, ggml_type) ()
#11 0x00000000004a375f in GGMLBlock::init(ggml_context*, ggml_type) ()
#12 0x00000000004a375f in GGMLBlock::init(ggml_context*, ggml_type) ()
#13 0x00000000004a375f in GGMLBlock::init(ggml_context*, ggml_type) ()
#14 0x00000000004f2e79 in FluxCLIPEmbedder::FluxCLIPEmbedder(ggml_backend*, ggml_type, int) ()
#15 0x000000000049836c in StableDiffusionGGML::load_from_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, ggml_type, schedule_t, bool, bool, bool) ()
#16 0x000000000047b212 in new_sd_ctx ()
#17 0x0000000000420fe8 in main ()
So it crashes way earlier.
[INFO ] stable-diffusion.cpp:266 - Weight type: tq1_0
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: tq1_0
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: tq1_0
[INFO ] stable-diffusion.cpp:269 - VAE weight type: tq1_0
I guess sd.cpp's type list is out of date, bc that is very wrong. Would also explain the typesize tensor mismatch assert.
it seemed fine before updating ggml:
[INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell
[INFO ] stable-diffusion.cpp:266 - Weight type: f16
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:269 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes
Yea, updating ggml might be more work than I thought.
Instead, can you try applying some patches manually, and see what happens? Like this one: https://github.com/ggerganov/llama.cpp/pull/9346
It would also help if you could build in Debug mode, so we have line numbers :)
so after
cd ggml
git checkout 21d3a30
I applied llama.cpp 2a358fb.
and rebuilt with debugging.
Option:
n_threads: 8
mode: txt2img
model_path:
wtype: unspecified
clip_l_path: stable-diffusion.cpp/models/clip_l.safetensors
t5xxl_path: stable-diffusion.cpp/models/t5xxl_fp16.safetensors
diffusion_model_path: stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf
vae_path: stable-diffusion.cpp/models/ae.safetensors
taesd_path:
esrgan_path:
controlnet_path:
embeddings_path:
stacked_id_embeddings_path:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: output.png
init_img:
control_image:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
strength(control): 0.90
prompt: a lovely cat
negative_prompt:
min_cfg: 1.00
cfg_scale: 1.00
guidance: 3.50
clip_skip: -1
width: 512
height: 512
sample_method: euler
schedule: default
sample_steps: 4
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:175 - Using SYCL backend
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
Downloading separate debug info for /usr/lib/libze_loader.so.1
Downloading separate debug info for /usr/lib/libfmt.so.10
Downloading separate debug info for /opt/intel/oneapi/compiler/2024.1/lib/libonnxruntime.1.12.22.721.so
Downloading separate debug info for /usr/lib/intel-opencl/libigdrcl.so
Downloading separate debug info for /usr/lib/libigdgmm.so.12
[New Thread 0x7fffcda006c0 (LWP 207031)]
Downloading separate debug info for /usr/lib/libigdfcl.so.1
Downloading separate debug info for /usr/lib/libopencl-clang.so.14
Downloading separate debug info for /usr/lib/libigc.so.1
Downloading separate debug info for /usr/lib/libnvidia-opencl.so.1
Downloading separate debug info for /usr/lib/libze_intel_gpu.so.1
Downloading separate debug info for /usr/lib/libze_tracing_layer.so.1
[New Thread 0x7fffcd0006c0 (LWP 207033)]
[New Thread 0x7fffba2006c0 (LWP 207034)]
[Thread 0x7fffba2006c0 (LWP 207034) exited]
found 1 SYCL devices:
| | | | |Max | |Max |Global | |
| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel UHD Graphics 770| 1.3| 32| 512| 32| 78526M| 1.3.30049|
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
[INFO ] stable-diffusion.cpp:202 - loading clip_l from 'stable-diffusion.cpp/models/clip_l.safetensors'
[INFO ] model.cpp:793 - load stable-diffusion.cpp/models/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from 'stable-diffusion.cpp/models/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209 - loading t5xxl from 'stable-diffusion.cpp/models/t5xxl_fp16.safetensors'
[INFO ] model.cpp:793 - load stable-diffusion.cpp/models/t5xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from 'stable-diffusion.cpp/models/t5xxl_fp16.safetensors'
[INFO ] stable-diffusion.cpp:216 - loading diffusion model from 'stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf'
[INFO ] model.cpp:790 - load stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf using gguf format
[DEBUG] model.cpp:807 - init from 'stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:223 - loading vae from 'stable-diffusion.cpp/models/ae.safetensors'
[INFO ] model.cpp:793 - load stable-diffusion.cpp/models/ae.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from 'stable-diffusion.cpp/models/ae.safetensors'
[INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell
[INFO ] stable-diffusion.cpp:266 - Weight type: f16
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:269 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:310 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:313 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size = 12057.71 MB(VRAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398 - loading weights
[DEBUG] model.cpp:1530 - loading tensors from stable-diffusion.cpp/models/clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from stable-diffusion.cpp/models/t5xxl_fp16.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from stable-diffusion.cpp/models/flux1-schnell-q8_0.gguf
[DEBUG] model.cpp:1530 - loading tensors from stable-diffusion.cpp/models/ae.safetensors
[INFO ] stable-diffusion.cpp:497 - total params memory size = 21471.11MB (VRAM 12152.28MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12057.71MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 14.95s
[INFO ] stable-diffusion.cpp:518 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:572 - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "a lovely cat"
[INFO ] stable-diffusion.cpp:655 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1036 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 256
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 68.25 MiB
[DEBUG] ggml_extend.hpp:1001 - t5 compute buffer size: 68.25 MB(RAM)
[New Thread 0x7fffb9800740 (LWP 207066)]
[New Thread 0x7fffb8e007c0 (LWP 207067)]
[New Thread 0x7fffb1200840 (LWP 207068)]
[New Thread 0x7ffd610008c0 (LWP 207069)]
[New Thread 0x7ffd5be00940 (LWP 207070)]
[New Thread 0x7ffd5b4009c0 (LWP 207071)]
[New Thread 0x7ffd5aa00a40 (LWP 207072)]
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 9135 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 9138 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42
ggml_gallocr_reserve_n: reallocating SYCL0 buffer from size 0.00 MiB to 398.50 MiB
[DEBUG] ggml_extend.hpp:1001 - flux compute buffer size: 398.50 MB(VRAM)
AssertHandler::printMessage
Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00007fffba8b7240 in ?? () from /usr/lib/libze_intel_gpu.so.1
@(gdb) bt
#0 0x00007fffba8b7240 in ?? () from /usr/lib/libze_intel_gpu.so.1
#1 0x00007fffba8b7f46 in ?? () from /usr/lib/libze_intel_gpu.so.1
#2 0x00007fffba86c3a2 in ?? () from /usr/lib/libze_intel_gpu.so.1
#3 0x00007fffba508af1 in ?? () from /usr/lib/libze_intel_gpu.so.1
#4 0x00007fffba50af69 in ?? () from /usr/lib/libze_intel_gpu.so.1
#5 0x00007fffba4e8962 in ?? () from /usr/lib/libze_intel_gpu.so.1
#6 0x00007ffff75930b6 in urQueueFinish () from /opt/intel/oneapi/compiler/2024.1/lib/libpi_level_zero.so
#7 0x00007ffff759c74b in piQueueFinish () from /opt/intel/oneapi/compiler/2024.1/lib/libpi_level_zero.so
#8 0x00007fffe47185b7 in _pi_result sycl::_V1::detail::plugin::call_nocheck<(sycl::_V1::detail::PiApiKind)23, _pi_queue*>(_pi_queue*) const ()
from /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7
#9 0x00007fffe48d043f in sycl::_V1::detail::queue_impl::wait(sycl::_V1::detail::code_location const&) () from /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7
#10 0x000000000062751a in ggml_backend_sycl_synchronize(ggml_backend*) ()
#11 0x00000000005ab047 in ggml_backend_graph_compute ()
#12 0x00000000004b5295 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#13 0x00000000004f4cf5 in FluxModel::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, float, ggml_tensor**, ggml_context*) ()
#14 0x000000000051a171 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const ()
#15 0x000000000049d657 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition) ()
#16 0x000000000047c9f9 in generate_image(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, float, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#17 0x000000000047e89f in txt2img ()
#18 0x00000000004212f6 in main ()
And the same as initial behavior.
Hi, trying to use SYCL on the 12600K's iGPU UHD 770, I've ran into the following issue:
OS: Manjaro to install oneAPI:
(despite-dependencies because of blender)
and
from https://archlinux.org/packages/extra/x86_64/intel-oneapi-basekit/ instead of a simple
pacman -S --needed intel-oneapi-basekit
as the current repo package appears broken.Then installing dnn and the driver with
sudo pacman -S --needed onednn onetbb intel-compute-runtime
.I've built it like this:
F16 on or off makes no difference, the same result:
I can see the iGPU being used at 100% for render/3D by sd executable, visible in intel_gpu_top of intel-gpu-tools package on the line
for approx. 20 seconds, then it proceeds to
RAM is 80GB, didn't run out.