mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
12.36k stars 780 forks source link

Improvements for Function Calling #462

Open stippi opened 3 months ago

stippi commented 3 months ago

Congratulations on merging #451 ! It's really great to have this available. I'm hoping for two improvements:

  1. Currently, it is not possible to supply a system prompt together with tools. From reading the diff... have you experimented with simply merging the provided system prompt with the "official" prompt by the Hermes-2-Pro team? By "merging" I mean to simply append the part after "You are a tool calling AI" to the provided system prompt.

  2. Streaming the tool call array JSON while it's being formed as "content" is not compliant with the OpenAI API. GPT-4 might reply to the user and invoke tools concurrently, so I guess my application is not the only one to stream "content" to the user always and expect it to be meant for the user to see. I think the model needs to be instructed to output the tools array between special tags, which are filtered from content.

CharlieFRuan commented 3 months ago

Thank you for the suggestion! We acknowledge that https://github.com/mlc-ai/web-llm/pull/451 is only a preliminary support and will improve it.

In the meantime, it might be possible to use models like Hermes-2-Pro without using the tools API while post-processing the output on your own, this should achieve what the official repo instructs (though without the promise of guaranteed JSON output, since a response may or may not be a function call, and we haven't considered the special token in grammar).

jrobinson01 commented 3 months ago

I'm not sure if this needs a separate issue or can go here. There's a couple other issues with function calling. If used in a multi-message chat scenario, you can't pass back in the messages, since it will contain the system prompt as created by web-llm code. It throws an error since it doesn't like having the system prompt supplied.

There is another issue (hopefully fixed in this PR) where after the llm responds with the functions to be called, you can't pass the result back in with the "tool" role, since the web-llm code currently insists that the last message be from the "user" role. Here's the PR that will hopefully fix that one. https://github.com/mlc-ai/web-llm/pull/467

stippi commented 3 months ago

If used in a multi-message chat scenario, you can't pass back in the messages, since it will contain the system prompt as created by web-llm code. It throws an error since it doesn't like having the system prompt supplied.

Not sure I understand your concern. The idea would be to dynamically inject the section about tools into the system message that the LLM sees. Right now, the application cannot pass a system message at all together with the tools array. Because a system message is generated by WebLLM that contains the transformed tools array. All I'm saying is that instead, WebLLM should take the system message provided by the app and just append the section with the transformed tools array. This would not get passed back to the application. The application would always provide its version of the system messages and the tools in the tools array. On each turn, WebLLM would inject the tools section again, but only for the LLM.

There is another issue (hopefully fixed in this PR) where after the llm responds with the functions to be called, you can't pass the result back in with the "tool" role, since the web-llm code currently insists that the last message be from the "user" role. Here's the PR that will hopefully fix that one. #467

Ok, I didn't even get that far to notice this problem. :-) Good to know it is being addressed.

jrobinson01 commented 3 months ago

I may be wrong about the first issue. I'll need to try it again when I get some more free time. I agree though that we should have some options when it comes to the system prompt. Appending to it, or overriding it, or both seem useful.