curl error - Githubissues

hk8805 commented 9 months ago

"addmm_implcpu\" not implemented for 'Half'

Jasonzzt commented 9 months ago

I tried to reproduce the problem and used the cURL tool in the fastchat library to test the api server. But I didn't get the error like "addmm_impl_cpu_" not implemented for 'Half'.

The following is the result I got when I reproduced it

root@spr-14:/llm# curl http://localhost:8001/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "internlm-chat-7b",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'
{"id":"chatcmpl-hkU22gKHQNbUhEWKoUxb39","object":"chat.completion","created":1700558851,"model":"internlm-chat-7b","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! My name is [Name], a helpful assistant. How can I assist you today?<eoa>"},"finish_reason":"stop"}],"usage":{"prompt_tokens":62,"total_tokens":81,"completion_tokens":19}}

I used the docker image intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT with the following environment.

Package                     Version
--------------------------- ------------------
bigdl-llm                   2.4.0
transformers                4.31.0
fschat                      0.2.28
torch                       2.1.1+cpu

So please check your environment and provide it to us. This will also better help us reproduce the issue.

hkvision commented 9 months ago

@Ricky-Ting @Ariadne330 have encountered this issue as well. Seems fp16 can't run on CPU?

jason-dai commented 9 months ago

@Ricky-Ting @Ariadne330 have encountered this issue as well. Seems fp16 can't run on CPU?

No fp16 support on CPU

intel-analytics / ipex-llm

curl error #9494