Open mottoslo opened 2 months ago
gentle ping @drbh is this issue being handled internally ? any feedback would be great !
I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !
I am running into this issue as well. I am not knowledgable enough in rust to deal with this, but I would very much appreciate if you take this on @mottoslo !
I think handling this issue may involve (breaking) changes in feature and needs to be discussed beforehand, hence I do not know where to start. However, some pull requests have been opened since ( https://github.com/huggingface/text-generation-inference/pull/2645 https://github.com/huggingface/text-generation-inference/pull/2614 ... ) that I think are related to this, so I assume there's an internal consensus on how things should be done ?
Either way, it would be a huge improvement. As it stands, we can't easily build agents based on models deployed with TGI because of this. At least not using the Messages API. I tried manually applying the chat template and using the generate endpoints, and the model appears to be able to choose not to use a tool. The downside of this approach is that the manual chat template handling makes it much harder to integrate in existing frameworks. Being able to use TGI as a drop-in replacement for OpenAI models would be fantastic.
System Info
tgi version : 2.3.0 model : Meta-Llama-3-8B-Instruct
Information
Tasks
Reproduction
0. tool definition to use for reproduction
1. Using OpenAI with tool_choice="auto"
=> responds with normal chat message since prompt does not need tool_call
2. Using tgi with tool_choice="auto" (model = llama)
=> tries to call a function anyway
3. Using OpenAI with tool_choice="required"
=> tries to call a function anyway
Expected behavior
When consuming tgi, I expect the server to be able to respond both with and without tool_call, when provided with tool definitions. As of now, application needs to be aware that tool calling is required before calling tgi, which In my opinion is not something LLM applications should aim for.
I am curious if the above behavior is intended. I have found that someone has raised this issue, (https://github.com/huggingface/text-generation-inference/pull/1587#issuecomment-1979185339) but it wasn't addressed anywhere.
Maybe something can be done with tool prompt, ToolType enum and chat_completions logic in server.rs ? If this behavior is not intended and needs fixing, I would love to give it a shot ! Thank you :)