Closed whoreson closed 1 month ago
Q4_1 crash backtrace:
llama_new_context_with_model: graph splits = 1
[New Thread 5064.0x680]
[New Thread 5064.0x5a8]
[New Thread 5064.0x1268]
Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 5064.0x680]
quantize_row_q8_1 (x=0x367058c0, vy=0x37e0ca0, k=4096)
at ggml/src/ggml-quants.c:1397
1397 y4[i4].d[ir+4] = GGML_FP32_TO_FP16(d * hsum_i32_8(_mm256_add
_epi32(_mm256_add_epi32(i0, i1), _mm256_add_epi32(i2, i3))));
(gdb) bt
#0 quantize_row_q8_1 (x=0x367058c0, vy=0x37e0ca0, k=4096)
at ggml/src/ggml-quants.c:1397
#1 0x00000000004dd7c9 in ggml_compute_forward_mul_mat (params=0x4810fda0,
dst=0x346a1250) at ggml/src/ggml.c:12945
#2 0x00000000004ef622 in ggml_compute_forward (params=0x4810fda0,
tensor=0x346a1250) at ggml/src/ggml.c:17689
#3 0x00000000004f478d in ggml_graph_compute_thread (data=0x4810fe20)
at ggml/src/ggml.c:19765
#4 0x00000000004ffddb in ggml_graph_compute._omp_fn.0 ()
at ggml/src/ggml.c:19816
#5 0x000000006360cf98 in omp_in_final ()
from C:\util\Strawberry\c\bin\libgomp-1.dll
#6 0x0000000064944ae4 in pthread_create_wrapper ()
from C:\util\Strawberry\c\bin\libwinpthread-1.dll
#7 0x000007fefd2d42bf in sqrt () from C:\Windows\system32\msvcrt.dll
#8 0x000007fefd2d7459 in msvcrt!_beginthreadex ()
from C:\Windows\system32\msvcrt.dll
#9 0x0000000076da652d in KERNEL32!BaseThreadInitThunk ()
from C:\Windows\system32\kernel32.dll
#10 0x0000000076fdc521 in ntdll!RtlUserThreadStart ()
from C:\Windows\SYSTEM32\ntdll.dll
#11 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Seems to be different perhaps?.. Still, works with stock llama.cpp.
Can you post your system_info
message when these crashes happen? It should look something like this
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Thanks!
INFO [ main] system info | tid="1" timestamp=1725092503 n_thr
eads=4 n_threads_batch=-1 total_threads=4 system_info="AVX = 1 | AVX_VNNI = 0 |
AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FM
A = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD =
0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1
I was suspecting something I might have missed between AVX2
and AVX
, but no, you have AVX2
.
I have no access to a Windows box, and even less to Windows 7 with GCC 8.3, so not sure how to debug.
With the second crash you posted a bt (the one during quantization), what are the values of k
, nb
, i4
and ir
?
Hmm no, all of these are results of llama-cli, not quantize.
1397 y4[i4].d[ir+4] = GGML_FP32_TO_FP16(d * hsum_i32_8(_mm256_add
_epi32(_mm256_add_epi32(i0, i1), _mm256_add_epi32(i2, i3))));
(gdb) p k
$1 = 4096
(gdb) p nb
$2 = 128
(gdb) p i4
$3 = 0
(gdb) p ir
$4 = 0
Then y4
must be null
?
(gdb) p y4
$5 = (block_q8_1_x4 * restrict) 0x3870ca0
So
y4
is not null12...16
of the memory block pointed to by y4
. The memory block is 4608 bytes (the row size of Q8_1
-quantized tensor row with 4096 elements), so we are not having an out-of-bounds accessSIGSEGV
, so we are attempting to write to memory not accessible to usy4
is somehow pointing to outside of our process address spacevalgrind
. Is one of those available on this Windows box?Ehm, looks like it's not gonna be that easy... Just tried with TDM-GCC's gcc version 10.3.0 (tdm64-1), and the results are the same.
Hmm... Could it be related that I've been disabling the -muse-unaligned-vector-move assembler flag? I don't have a recent enough binutils for it, and llama.cpp's been working so far...
Alas, no... Same crash with latest mingw's gcc 14.1 and binutils 2.42.
If you tried 3 different compiler versions and the crash persists, then it is more likely that it is a bug in the code that somehow only shows up on Windows (any Windows or just Windows 7?).
I see here that one can use the address sanitizer with clang
for Windows. If you can get it going that way, this might help find the problem.
Okay "good news", I've compiled it with the same TDM-GCC on a Windows 11 box (with -mno-avx512f, because it's a much newer CPU), and it crashes there too.
It works when compiled with the default AVX512 setting.
Do you find it important to disable AVX512?
Well since the Windows 7 PC in question is only AVX2, I kinda absolutely have to, in order to maintain the comparison...
So it'd seem to me that there's some AVX2 bug going on on all Windows OSes? I'll check if I can do some address sanitizing checks, but sounds extremely painful.
I can set up an rdesktop access if that's at all helpful.
-march=native
does not work? This enables the features your CPU supports. If you are setting this manually, you need FMA
and F16C
in addition to AVX2
Err, I think you misunderstood. I'm using the default flags as usual. In order to test the AVX2 code on the PC which has Windows 11 (to check if it's a 7 vs 11 issue), I had to disable AVX512 on that box - naturally.
I can set up an rdesktop access if that's at all helpful.
Sooo... no?
We can try, but I'm not very hopeful as I haven't touched a Windows computer for 10+ years. What is the Linux rdesktop client one uses these days? I'm on Ubuntu 22.04.
Well, it's called just that, "rdesktop". It works fine. I'll set it up then. Err, can github do private messages? If not, I have Telegram.
As far as I can tell the private message feature has been removed from Githib. I don't have Telegram. I made my email address public. If you fetch the latest main branch the last commit will have my email.
Cool, just sent you an e-mail (from s.t@gmail).
So, I used the provided rdesktop
access to try to debug - without success. Supporting exotic systems (and yes, a Windows 7 box in the year 2024 is an exotic system on my book) is not one of the goals here - you are much better served with the mainline llama.cpp
project.
What happened?
Trying latest HEAD with: Fimbulvetr-11B-v2-Q8_0.gguf (or L3-8B-Stheno-v3.1-Q8_0-imat.gguf, or SFR-Iterative-DPO-LLaMA-3-8B-R-Q8_0.gguf for example):
Crashes here without even trying to load and malloc the GGUF. After disabling this code block:
... seems to make it work with these files, but still crashes with Fimbulvetr Q4_1. Works with stable-code-3b-q5_k_m.gguf even without any modification, though. Also everything works on Linux. This is a Win7 PC with Strawberry Perl's gcc version 8.3.0 (x86_64-posix-seh, Built by strawberryperl.com project).
Stock llama.cpp works.
Seems really weird, any hints on debugging this?
Name and Version
c7e99c88a2de7489ba2a1539b1a9025912010b70
What operating system are you seeing the problem on?
Windows
Relevant log output
No response