Does not run with i2 model; SIGABRT and assertion `!isnan(wp[i])' failed.

Running on a vanilla Debian

$ python run_inference.py -p "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:" -n 6 -temp 0  -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf

It crashes with:

[...]
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2

system_info: n_threads = 2 (n_threads_batch = 2) / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

sampler seed: 4294967295
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> greedy
generate: n_ctx = 2048, n_batch = 1, n_predict = 6, n_keep = 1

llama-cli: /opt/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:14541: void ggml_compute_forward_soft_max_f32(const struct ggml_compute_params *, struct ggml_tensor *): Assertion `!isnan(wp[i])' failed.
Error occurred while running command: Command '['build/bin/llama-cli', '-m', 'models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf', '-n', '6', '-t', '2', '-p', 'Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\\nAnswer:', '-ngl', '0', '-c', '2048', '--temp', '0.0', '-b', '1']' died with <Signals.SIGABRT: 6>.

It runs normally with ggml-model-f32.gguf.

Clang:

$ clang -v
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64

Debian version:

$ cat /etc/debian_version
12.7

Installation was done with the script in a virtualenv.

microsoft / BitNet

Does not run with i2 model; SIGABRT and assertion `!isnan(wp[i])' failed. #71