runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

weird output when using a custom model and ChatAPI does not work #55

Closed Mr-Nobody1 closed 5 months ago

Mr-Nobody1 commented 5 months ago

Hello, I am using a a pre-trained model which is CodeLlama based. The repo is [https://huggingface.co/defog/sqlcoder-7b-2]. I have further finetuned it and posted it on my repo. These are the logs image

Response when a request is made

{
  "delayTime": 1203,
  "executionTime": 1601,
  "id": "b7d69dec-5bd4-4879-95c7-837b7e62ed2c-e1",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "6\" which is available in the Play store to try if you want to copy"
          ]
        }
      ],
      "usage": {
        "input": 3,
        "output": 16
      }
    }
  ],
  "status": "COMPLETED"
}

What could be the issue? Also, openai compatible API is not working?

alpayariyak commented 5 months ago

Your base model is a completion one, which is intended for text completion, not chat. Thus, unless you specify a jinja chat template with CUSTOM_CHAT_TEMPLATE environment variable, you can only use completions API.

Mr-Nobody1 commented 5 months ago

@alpayariyak Like this?


You are a friendly chatbot who always responds in the style of a pirate</s> 
<|user|>
How many helicopters can a human eat in one sitting?</s> 
<|assistant|>
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.```
Mr-Nobody1 commented 5 months ago

image I tried giving it a chat template and it still does this.

alpayariyak commented 5 months ago

Does what?

Mr-Nobody1 commented 5 months ago

Does what?

image

This kind of relpy. Will the chat template below work?

 <s>[INST] <<SYS>>

{{ system_prompt }}

<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST] 
Mr-Nobody1 commented 5 months ago

Which chat template is used by the Llama-2-7b which is provided on runpod?

alpayariyak commented 5 months ago

Set apply_chat_template to true in the input when not using OpenAI Chat Completions, please refer to the documentation on this. The chat template must be consistent with the base model’s end of sentence, end of turn, beginning of sentence and end of sentence tokens. The documentation points to resources on chat templates. This is not an issue with the vLLM worker, but you can keep modifying your chat template with vLLM on a Pod until you get desired results.