Open LizeRaes opened 1 month ago
Thanks for reporting!
cc @cescoffier
I like the idea. I think it would require dedicated interfaces as the parameters and behavior would be slightly different.
For the output one, we need to design the resilience patterns we want. You already mentioned retry, but I'm wondering if we need to call the method again or ask the LLM to recompute the tool execution request.
I would opt for the LLM to recompute (retry) and have the option to provide a message (like "tool output contained customer email address, make sure to not use this tool to divulge private information" or whatever you want to check for).
Unless I'm overlooking something, I think that logic to call the method again (with same parameters) could be handle within the tool itself, if the output isn't to satisfaction.
There is a related discussion on tool calling going on in langchain4j core repo https://github.com/langchain4j/langchain4j/discussions/1997 Maybe we should check what to implement/port from where to where?
So, right now we have the following sequence of messages:
-> User message
<- Tool execution request
-> Tool execution result
<- Assistant message
When the tool execution failed, what do we have?
-> User message
<- Tool execution request
-> Tool execution failure
<- Assistant message with a finished reason indicating a failure or does it retry the execution request?
The question is about where to insert the guardrails:
As far as I understand, guardrails act on inputs and outputs to/from AI Service? Tool calling is not exposed from AI service and happens internally, so current guardrails cannot catch this, right? Also, guardrail will have to know specifics of each tool to be able to validate the inputs...
tools break when the parameter syntax isn't respected
I guess cases like this (wrong tool name, wrong tool parameter name or type, etc) should be handled by DefaultToolExecutor
automatically. Instead of throwing an exception from ToolExecutor
(e.g. when parameters cannot be parsed) we should return this error in plain text so that it is sent to the LLM for recovery.
When the tool execution failed, what do we have?
-> User message
<- Tool execution request
-> Tool execution failure
<- Tool execution request (LLM tries to fix the problem)
...
In this case we should probably implement some "max retries" mechanism to make sure smaller LLMs don't go into an endless loop.
We also need to distinguish between different types of issues here:
Thanks @langchain4j ! That is exactly what I was looking for!
Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)
A pre-tools guardrail could handle this and decide what to do.
LLM provided "illegal" (from business point of view) tool inputs -> a guardrail-like mechanism could probably handle this
Yes, a pre-tools guardrail can handle this case.
Tool was called, but thrown an exception -> in this case we already convert exception to text and send it to LLM so it can recover. This seems to work pretty well. But perharps we could make the strategy configurable here
We could imagine having a post-tools guardrail that can update the message and "guide" the LLM
Tool was called, but produced "illegal" output (e.g. sensitive info) -> a guardrail-like mechanism could probably handle this
Yes, a post-tools guardrail can handle this.
@cescoffier are there pre- and post-tools guardrails already? Or is this just a concept?
Tool cannot be called (e.g. LLM hallucinated tool name or parameter name/type) -> right now we just fail in this case. But we could automatically send the error to the LLM and it will retry (can be a default behavior, with an option to configure desired strategy)
A pre-tools guardrail could handle this and decide what to do.
If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves
are there pre- and post-tools guardrails already? Or is this just a concept?
It's just a concept for now. As I modify how Quarkus invokes tools, I can easily implement it - well except maybe in the virtual thread case.
If we go this way, this should be an out-of-the-box guardrail that users could just use without the need to implement it themselves
Yes, or having a default strategy or disabling it when guardrails are used. It's not clear what's best for now.
The GuardRails are really awesome! It would be nice if we could also have them available to perform a check before executing a tool for these reasons:
It would also be nice to have a GuardRail option for the Tool output, eg. to have a final check that no private user info is divulged, etc.