$ python run_inference.py -p "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:" -n 6 -temp 0 -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf
It crashes with:
[...]
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
system_info: n_threads = 2 (n_threads_batch = 2) / 24 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampler seed: 4294967295
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> greedy
generate: n_ctx = 2048, n_batch = 1, n_predict = 6, n_keep = 1
llama-cli: /opt/BitNet/3rdparty/llama.cpp/ggml/src/ggml.c:14541: void ggml_compute_forward_soft_max_f32(const struct ggml_compute_params *, struct ggml_tensor *): Assertion `!isnan(wp[i])' failed.
Error occurred while running command: Command '['build/bin/llama-cli', '-m', 'models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf', '-n', '6', '-t', '2', '-p', 'Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\\nAnswer:', '-ngl', '0', '-c', '2048', '--temp', '0.0', '-b', '1']' died with <Signals.SIGABRT: 6>.
Running on a vanilla Debian
It crashes with:
It runs normally with
ggml-model-f32.gguf
.Clang:
Debian version:
Installation was done with the script in a virtualenv.