GGML_ASSERT Failed During Benchmarking Dummy Model on Apple Silicon Mac.

Description

Error when running the llama-bench tool on a dummy model, as specified in readme. Failed assertion in ggml.c relating to tile number for parallel processing.

Steps to Reproduce

git clone --recursive https://github.com/microsoft/bitnet BitNetRecreate
cd BitNetRecreate
conda activate bitnet-cpp
python setup_env.py --hf-repo 1bitLLM/bitnet_b1_58-large -q tl1
python utils/generate-dummy-bitnet-model.py models/bitnet_b1_58-large \
    --outfile models/dummy-bitnet-125m.tl1.gguf --outtype tl1 --model-size 125M
python utils/e2e_benchmark.py -m models/dummy-bitnet-125m.tl1.gguf

Erroneous output

.../BitNetRecreate/3rdparty/llama.cpp/ggml/src/ggml.c:12696: GGML_ASSERT(ne0 % n_tile_num == 0) failed
ERROR:root:Error occurred while running command: Command '['/Users/mbeton/Documents/exo_bitnet/BitNetRecreate/build/bin/llama-bench', '-m', 'models/dummy-bitnet-125m.tl1.gguf', '-n', '128', '-ngl', '0', '-b', '1', '-t', '2', '-p', '512', '-r', '5']' died with <Signals.SIGABRT: 6>.

OS: macOS (M2 Pro MacBook Pro) Python Version: Python 3.9.20 Repository Commit: bf11a49f11b9d0535285cc4cdec834a28762ed87

Further Exploration

To diagnose, I added print statements after line 12695 of ggml.c. These showed that the ne0 value is as expected, but the n_tile_num value is 0, leading to the assertion error.

I tested with model sizes 125M and 350M. The only instance when the program doesn't crash is when creating a new model type '700M' with identical parameters to the original bitnet_b1_58_large model.

microsoft / BitNet