microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
10.56k stars 711 forks source link

./llama-cli -m models/ggml-model-i2_s.gguf >> CORE DUMPED #55

Open danilopau opened 1 week ago

danilopau commented 1 week ago

I rtried to feed llama.cpp CLI with the gguf generated via setup

here what I got

root@stm32mp2:~/extra/llama.cpp# ./llama-cli -m models/ggml-model-i2_s.gguf Log start main: build = 3247 (911e35bb) main: built with aarch64-ostl-linux-gcc (GCC) 12.3.0 for aarch64-ostl-linux main: seed = 1729273000 GGML_ASSERT: ggml/src/ggml.c:20602: 0 <= info->type && info->type < GGML_TYPE_COUNT Aborted (core dumped)

this because I tried to run your model with llama.cpp compiled on aarch64-ostl-linux-gcc (GCC) 12.3.0 for aarch64-ostl-linux

any workaround please ?

x22x22 commented 22 hours ago

@danilopau

Do not use the bin file downloaded from the llama.cpp official website. Instead, run the command

python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s

which will automatically compile an appropriate bin file. Use this compiled file for execution.

Here's the English translation with proper formatting:

Note: When running this command for compilation, the following requirements must be met, especially for clang:

python>=3.9
cmake>=3.22
clang>=18

For Windows users: Install Visual Studio 2022. In the installer, make sure to select at least the following options (this will automatically install required additional tools like CMake):

For Debian/Ubuntu users: You can install using the automatic installation script:

bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

Additional requirement: