replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

add `messages` input for chat formatting #40

Open technillogue opened 3 months ago

technillogue commented 3 months ago

we have models with many different tokenizers and chat templates, and we already load the tokenizer that has this information, but for multi-turn dialog we push this into the client to figure out, which can cause problems in the various places that do this (openai-proxy, llama-chat, etc). we could avoid this by adding messages. we should figure out how this should interact with the existing prompt, system_prompt, and prompt_template