Closed paulcx closed 7 months ago
As of now, the chat templates for each model type are embedded directly in the code and are not yet configurable. However, this feature is certainly feasible to implement. For reference, you can view an example of the Yi model's chat template here: Yi Model Chat Template.
I'd be happy to explore this further. If you could provide me with more specific details regarding your requirements, we can discuss potential ways to tailor the chat template to better suit your scenario.
It seems like the current implementation of chat templates within the models is not directly accessible or modifiable by users, as they are embedded in the code. A potential solution to this could be to allow customization through external configuration, such as reading from a tokenizer_config.json file (https://huggingface.co/docs/transformers/chat_templating).
Absolutely, the ability to configure the chat template is indeed a legitimate requirement. However, I'm currently evaluating the urgency of this need, considering that chat templates typically remain stable in a production environment. I'll give this some thought and consider implementing support for this feature in the near future. Thanks for your feedback!
I think it’s good to read from tokenizer config by default. Also allow override by custom external file.
Yeah. Agreed. Just took a quick look, the chat template is defined in jinja format. it is not that trivial to add jinja compiler in c++. but let me try to figure out how to support it.
Just searched and found this project https://jinja2cpp.github.io/ Not sure is it helpful, I’m not good at c++.
Thanks. i also found it. let me evaluate and integrate it into ScaleLLM if it meets our needs.
is there any workaround for now? does it work if we modify c codes?
yes, you can directly update c/c++ code to workaround.
or you can just call v1/completions
api with prompt generated from correct chat template.
just fyi, i am working on the support with https://github.com/pantor/inja. but not sure if it would be ready before my vacation.
Just tested two open source projects for jinja parser. none of them actually works for now. need to dig more. supporting this feature in c++ is not as trivial as in python.
I found that you used two rust projects. Is it possible to use rust jinja template engine? https://docs.rs/minijinja/latest/minijinja/
Chat Template is not the only way to implement it if the Jinja parser takes too long. Another workaround is to provide users with more flexible inputs instead of the OpenAI-style API. For instance, TGI utilizes a simple inputs API that enables users to include all their requests in a single string.
Another option is similar to oobabooga template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates%2FOpenChat.yaml
Can be done by simple string replace
Thank you for your valuable input! Very appreciated!. We have recently evaluated various C++ implementations of Jinja2, but unfortunately, none of them seems to offer the complete functionality required for direct integration into our project. This presents us with a challenging task.
We will need additional time to explore alternative options, though it will be a lower priority compared to the ongoing efforts in kernel optimization. We intend to resume this evaluation at a later time, and in the meantime, the integration of chat templates will be moved into the backlog for future consideration.
Options we haven't investigated:
As for workarounds, You can consider the following:
Thanks for your understanding and patience!
Can you confirm whether using the v1/completions APIs means that all chat templates, including user/assistant, will become obsolete, and users will have to manually assemble the conversation into ONE single string, right?
yes. confirmed.
BTW, great news to share! I've successfully integrated jinja2cpp into our build system. This is a good Jinja2 parser candidate for chat templates. With this addition, we're on track to have full chat template support ready for use in the upcoming release next week. Exciting times ahead!
How to modify the chat template including (role prefix user/assistant and stop sign) base on different finetuning setting? Does chat_template in tokenizer_config.json work? If so, any example we can borrow from?