Open danilopau opened 1 week ago
@danilopau
Do not use the bin file downloaded from the llama.cpp official website. Instead, run the command
python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s
which will automatically compile an appropriate bin file. Use this compiled file for execution.
Here's the English translation with proper formatting:
Note: When running this command for compilation, the following requirements must be met, especially for clang:
python>=3.9
cmake>=3.22
clang>=18
For Windows users: Install Visual Studio 2022. In the installer, make sure to select at least the following options (this will automatically install required additional tools like CMake):
For Debian/Ubuntu users: You can install using the automatic installation script:
bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
Additional requirement:
I rtried to feed llama.cpp CLI with the gguf generated via setup
here what I got
root@stm32mp2:~/extra/llama.cpp# ./llama-cli -m models/ggml-model-i2_s.gguf Log start main: build = 3247 (911e35bb) main: built with aarch64-ostl-linux-gcc (GCC) 12.3.0 for aarch64-ostl-linux main: seed = 1729273000 GGML_ASSERT: ggml/src/ggml.c:20602: 0 <= info->type && info->type < GGML_TYPE_COUNT Aborted (core dumped)
this because I tried to run your model with llama.cpp compiled on aarch64-ostl-linux-gcc (GCC) 12.3.0 for aarch64-ostl-linux
any workaround please ?