vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
385 stars 30 forks source link

Chat template #28

Closed paulcx closed 7 months ago

paulcx commented 11 months ago

How to modify the chat template including (role prefix user/assistant and stop sign) base on different finetuning setting? Does chat_template in tokenizer_config.json work? If so, any example we can borrow from?

guocuimi commented 11 months ago

As of now, the chat templates for each model type are embedded directly in the code and are not yet configurable. However, this feature is certainly feasible to implement. For reference, you can view an example of the Yi model's chat template here: Yi Model Chat Template.

I'd be happy to explore this further. If you could provide me with more specific details regarding your requirements, we can discuss potential ways to tailor the chat template to better suit your scenario.

paulcx commented 11 months ago

It seems like the current implementation of chat templates within the models is not directly accessible or modifiable by users, as they are embedded in the code. A potential solution to this could be to allow customization through external configuration, such as reading from a tokenizer_config.json file (https://huggingface.co/docs/transformers/chat_templating).

guocuimi commented 11 months ago

Absolutely, the ability to configure the chat template is indeed a legitimate requirement. However, I'm currently evaluating the urgency of this need, considering that chat templates typically remain stable in a production environment. I'll give this some thought and consider implementing support for this feature in the near future. Thanks for your feedback!

kitckso commented 11 months ago

I think it’s good to read from tokenizer config by default. Also allow override by custom external file.

guocuimi commented 11 months ago

Yeah. Agreed. Just took a quick look, the chat template is defined in jinja format. it is not that trivial to add jinja compiler in c++. but let me try to figure out how to support it.

kitckso commented 11 months ago

Just searched and found this project https://jinja2cpp.github.io/ Not sure is it helpful, I’m not good at c++.

guocuimi commented 11 months ago

Thanks. i also found it. let me evaluate and integrate it into ScaleLLM if it meets our needs.

paulcx commented 11 months ago

is there any workaround for now? does it work if we modify c codes?

guocuimi commented 11 months ago

yes, you can directly update c/c++ code to workaround. or you can just call v1/completions api with prompt generated from correct chat template. just fyi, i am working on the support with https://github.com/pantor/inja. but not sure if it would be ready before my vacation.

guocuimi commented 11 months ago

Just tested two open source projects for jinja parser. none of them actually works for now. need to dig more. supporting this feature in c++ is not as trivial as in python.

kitckso commented 11 months ago

I found that you used two rust projects. Is it possible to use rust jinja template engine? https://docs.rs/minijinja/latest/minijinja/

paulcx commented 11 months ago

Chat Template is not the only way to implement it if the Jinja parser takes too long. Another workaround is to provide users with more flexible inputs instead of the OpenAI-style API. For instance, TGI utilizes a simple inputs API that enables users to include all their requests in a single string.

kitckso commented 11 months ago

Another option is similar to oobabooga template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates%2FOpenChat.yaml

Can be done by simple string replace

guocuimi commented 10 months ago

Thank you for your valuable input! Very appreciated!. We have recently evaluated various C++ implementations of Jinja2, but unfortunately, none of them seems to offer the complete functionality required for direct integration into our project. This presents us with a challenging task.

We will need additional time to explore alternative options, though it will be a lower priority compared to the ongoing efforts in kernel optimization. We intend to resume this evaluation at a later time, and in the meantime, the integration of chat templates will be moved into the backlog for future consideration.

Options we haven't investigated:

As for workarounds, You can consider the following:

Thanks for your understanding and patience!

paulcx commented 10 months ago

Can you confirm whether using the v1/completions APIs means that all chat templates, including user/assistant, will become obsolete, and users will have to manually assemble the conversation into ONE single string, right?

guocuimi commented 10 months ago

yes. confirmed.

BTW, great news to share! I've successfully integrated jinja2cpp into our build system. This is a good Jinja2 parser candidate for chat templates. With this addition, we're on track to have full chat template support ready for use in the upcoming release next week. Exciting times ahead!