Open mariofusco opened 1 week ago
Build (on pull request)
This is the status report for running Build (on pull request)
on commit c89bf9ba666431ee9164acda3cc6677884a96143.
:white_check_mark: The latest workflow run for the pull request has completed successfully.
It should be safe to merge provided you have a look at the other checks in the summary.
I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message)
If you want to solve the prompt injection issue with this, I think you will need this feature. But maybe you envision to fix it in another way?
Typically, for our experiments for Devoxx, we had a sanitize()
method which replaced any ---
in the inputs as it was used as the delimiter.
@gsmet Yes I did something similar as well, but I was thinking that we could do something like:
@UserMessage("blablablabla {param1} and more blablabla2 {param2}")
String chat(@V("param1") @Guard String param1, @V("param2") String param2);
So only param1
is sent as input guard variable. The idea of parsing the input works with one parameter, but if we have multiple parameters, we might do many regexp things.
So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard
.
Yeah, that makes perfect sense
So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with
Guard
.
I'm not entirely sure on how this is supposed to work. Should the guardrail take in both the materialized output and the single annotated params? If so what you do in this case if you change the value of a param? Do you perform the materialization again?
At this point, in my opinion, it would be much clearer if we had a third form of guardrails, let's call them UserParamGuardrail
, working at the level of the single param and invoked before the materialization of the whole message. This would give us even more flexibility, so you could eventually annotate different params with different guardrails.
In this case the workflow that I envision is the following:
UserParamGuardrail
is validated and possibly rewritten by its own guardrail.InputGuardrail
(if any) for a further validation and rewriting.What do you think?
Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.
Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.
Ok, if so I suggest to review and eventually merge this pull request, and then we could introduce the guardrails for params with a second pull request.
Any news or comments on this? Is this a feature that we want? Or maybe we should only develop input guardrails working on user params as suggested by @lordofthejars ?
/cc @cescoffier @geoand
Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize()
with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.
Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with
sanitize()
with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.
Good question and indeed at the beginning I was also struggling to find a valid use case for input rewriting and I didn't implement this. Subsequently I discussed this with @lordofthejars and he pointed out that there could be situations where you may want to rewrite the user input for instance for anonymization purposes: e.g. you rewrite the user prompt by replacing people/companies/products names with placeholders in order to avoid leaking sensitive information to the LLM service provider.
I agree that rewriting input can have side effects, but I think that first, as the developer of the app, you know how you are rewriting, so maybe it is not that critical.
Of course, you can always implement this change before invoking the LLM,but the good thing about guards is that you can have all of them in a place, and not having logic spread across different places
Thanks, minimizing the risk of the sensitive input content being leaked into LLM is a good use case. I'd probably consider reporting a message to the user instead, Please retry, the sensitivity score is too high
, but I can imagine how anonymization can be a useful mechanism as well
Yeah also removing weird chars is an option, that yes it can be done as you mentioned as a sanitize method, but if we can put everything in one place, why not. But of course, there are workarounds for sure to not having to rewrite the prompt.
The way I see it, rewriting the user query is fine.
This pull request introduces the possibility of rewriting the user messages from the input guardrails. At the moment it is only possible to read and rewrite the complete materialized user message immediately before submitting it to the LLM. I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message), but if required I'm open to iterate on this and eventually also add this possibility.
/cc @lordofthejars @cescoffier @geoand