intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.45k stars 1.24k forks source link

curl error #9494

Open hk8805 opened 9 months ago

hk8805 commented 9 months ago
image

"addmm_implcpu\" not implemented for 'Half'

Jasonzzt commented 9 months ago

I tried to reproduce the problem and used the cURL tool in the fastchat library to test the api server. But I didn't get the error like "addmm_impl_cpu_" not implemented for 'Half'.

The following is the result I got when I reproduced it

root@spr-14:/llm# curl http://localhost:8001/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "internlm-chat-7b",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'
{"id":"chatcmpl-hkU22gKHQNbUhEWKoUxb39","object":"chat.completion","created":1700558851,"model":"internlm-chat-7b","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! My name is [Name], a helpful assistant. How can I assist you today?<eoa>"},"finish_reason":"stop"}],"usage":{"prompt_tokens":62,"total_tokens":81,"completion_tokens":19}}

I used the docker image intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT with the following environment.

Package                     Version
--------------------------- ------------------
bigdl-llm                   2.4.0
transformers                4.31.0
fschat                      0.2.28
torch                       2.1.1+cpu

So please check your environment and provide it to us. This will also better help us reproduce the issue.

hkvision commented 9 months ago

@Ricky-Ting @Ariadne330 have encountered this issue as well. Seems fp16 can't run on CPU?

jason-dai commented 9 months ago

@Ricky-Ting @Ariadne330 have encountered this issue as well. Seems fp16 can't run on CPU?

No fp16 support on CPU