unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.32k stars 1.03k forks source link

Error when deploying on HF inference endpoints #793

Open adamrobertolo78 opened 1 month ago

adamrobertolo78 commented 1 month ago

Hi there,

First thank you for unsloth, it's great!

I've finetuned a llama-3-8b-Instruct-bnb-4bit and pushed it to hf hub. When I try to deploy it using hf Inference Endpoints using a Text Generation Inference container, I get the following error when I try to use the endpoint:

Server tgi does not seem to support chat completion. Error: Template error: invalid operation: tried to use + operator on unsupported types string and undefined (in <string>:5)

with logs:

2024/07/22 12:24:32
{"timestamp":"2024-07-22T16:24:32.217557Z","level":"ERROR","message":"Template error: invalid operation: tried to use + operator on unsupported types string and undefined (in <string>:5)","target":"text_generation_router::infer","filename":"router/src/infer.rs","line_number":202,"span":{"name":"apply_chat_template"},"spans":[{"name":"chat_completions"},{"name":"apply_chat_template"}]}
2024/07/22 12:24:32
{"timestamp":"2024-07-22T16:24:32.217588Z","level":"ERROR","message":"Template error: invalid operation: tried to use + operator on unsupported types string and undefined (in <string>:5)","target":"text_generation_router::server","filename":"router/src/server.rs","line_number":1047,"span":{"name":"chat_completions"},"spans":[{"name":"chat_completions"}]}

(note that it works well with the llama-3-8b-Instruct-bnb-4bit base model)

Any idea of what's going on or what I should do?

danielhanchen commented 1 month ago

Ok weird (sorry there was a spam message as well) It seems like TGI does not support apply_chat_template?