Closed kunger97 closed 9 months ago
@kunger97 Hi, we have fixed this issue https://github.com/intel/neural-speed/pull/84. Please install the Neural Speed from the source code and try again~~
@Zhenzhong1 Hello, I tested the following with the latest version of the neural-speed build
python convert-hf-to-gguf.py ~/Models/Qwen-14B-Chat/ --outtype f16
./quantize ~/Models/Qwen-14B-Chat/ggml-model-f16.gguf 15 #Q4_K
./run_qwen -m ~/Models/Qwen-14B-Chat/ggml-model-Q4_K.gguf -p "你好。"
The program still indicates an error and exits (Segmentation fault (core dumped))
(neural-speed) u22f390a763ad8fc99b0d55cf8c167d0@idc-beta-batch-pvc-node-17:~$ ./run_qwen -m ~/Models/Qwen-14B-Chat/ggml-model-Q4_K.gguf -p "nihao"
Welcome to use the qwen on the ITREX!
main: seed = 1707018316
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:1 AVX512_VNNI:1 AMX_INT8:1 AMX_BF16:1 AVX512_BF16:1 AVX512_FP16:1
model.cpp: loading model from /home/u22f390a763ad8fc99b0d55cf8c167d0/Models/Qwen-14B-Chat/ggml-model-Q4_K.gguf
Loading the bin file with GGUF format...
error loading model: unrecognized tensor type 13
model_init_from_file: failed to load model Segmentation fault (core dumped)
@kunger97 Thanks for your reply!
This error means you don't use the latest Neural Speed branch.
install the latest version of llama.cpp
Please reinstall the Nerual Speed from the source code. Not llama.cpp~
pip list | grep neural-speed
pip uninstall neural-speed
# please make sure you have uninstalled all neural-speed libs.
Then python setup.py install
in the neural speed root directory and try other models again if you want.
But for QWEN, it may not be supported currently by GGUF format supported models. When Neural Speed enabled the GGUF feature, there were no general GGUF QWEN model before. I will update this model of GGUF format in the Neural Speed as soon as possible.
Original Nerual Speed bin model format for Qwen should be OK.
Thank you again!
Hi @Zhenzhong1 i get the same error:
error loading model: unrecognized tensor type 12
model_init_from_file: failed to load model
OS : WSL2 - Linux DESKTOP-PNBMAG8 5.15.133.1-microsoft-standard-WSL2 https://github.com/intel/neural-speed/pull/1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Python: Python 3.10.12
I tried:
pip uninstall neural-speed
git pull
pip list | grep neural-speed
neural-speed 0.2.dev9+ge2d3652
python3 setup.py install
python3 scripts/inference.py --model_name llama -m /home/dario-reply/neural-speed-tutorial/llama-2-7b.Q4_K_M.gguf -c 512 -b 1024 -n 256 -t 10 --color -p "She opened the door and see"
The same error. What should i do? Thank you for your help and support.
@dellamuradario Hi~ your branch may not be the latest. Your neural-speed
version is old 0.2.dev. Please git pull
the latest main branch.
I saw you used llama-2-7b.Q4_K_M.gguf. This quantization type is not supported now. Please try q4_0.gguf.
And please use another script. Not infernece.py
try this:
# numactl -m 0 -C 0-55 is optional
# model_path should be the local llama HF model.
numactl -m 0 -C 0-55 python scripts/python_api_example_for_gguf.py --model_name llama --model_path /home/zhenzhong/model/Llama-2-7b-chat-hf/ -m /home/zhenzhong/model/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_0.gguf
Inference screenshot:
Thanks you @Zhenzhong1! Works!
By the way, QWEN has been supported. https://github.com/intel/neural-speed/pull/127
it's a qwen base model download form hf, can inferencing with llama.cpp(latest version) but can't inferencing on latest version of neural-speed run_qwen shows error: