runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
242 stars 97 forks source link

Fix runtime error for dict chat templates #86

Closed nkruglikov closed 2 months ago

nkruglikov commented 3 months ago

A chat template in HuggingFace transformers can be a dict. This is the case, for example, for a popular Command-R model. For these models, a Runpod serverless endpoint created from the vLLM template will crash and reboot repeatedly, because OpenAIServingChat class expects a str in its chat_template constructor argument.

This patch fixes it, replacing a dict chat_template with a string value corresponding to a "default" key. This key should exist in all valid chat templates, as transformers would crash otherwise during apply_chat_template function.

pandyamarut commented 2 months ago

Seems like this has been addressed.