mariofusco commented 1 week ago

This pull request introduces the possibility of rewriting the user messages from the input guardrails. At the moment it is only possible to read and rewrite the complete materialized user message immediately before submitting it to the LLM. I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message), but if required I'm open to iterate on this and eventually also add this possibility.

/cc @lordofthejars @cescoffier @geoand

quarkus-bot[bot] commented 1 week ago

Status for workflow `Build (on pull request)`

This is the status report for running Build (on pull request) on commit c89bf9ba666431ee9164acda3cc6677884a96143.

:white_check_mark: The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

gsmet commented 1 week ago

I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message)

If you want to solve the prompt injection issue with this, I think you will need this feature. But maybe you envision to fix it in another way?

Typically, for our experiments for Devoxx, we had a sanitize() method which replaced any --- in the inputs as it was used as the delimiter.

lordofthejars commented 1 week ago

@gsmet Yes I did something similar as well, but I was thinking that we could do something like:

@UserMessage("blablablabla {param1} and more blablabla2 {param2}") 
String chat(@V("param1") @Guard String param1, @V("param2") String param2);

So only param1 is sent as input guard variable. The idea of parsing the input works with one parameter, but if we have multiple parameters, we might do many regexp things.

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

geoand commented 1 week ago

Yeah, that makes perfect sense

mariofusco commented 1 week ago

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

I'm not entirely sure on how this is supposed to work. Should the guardrail take in both the materialized output and the single annotated params? If so what you do in this case if you change the value of a param? Do you perform the materialization again?

At this point, in my opinion, it would be much clearer if we had a third form of guardrails, let's call them UserParamGuardrail, working at the level of the single param and invoked before the materialization of the whole message. This would give us even more flexibility, so you could eventually annotate different params with different guardrails.

In this case the workflow that I envision is the following:

Every single params with a UserParamGuardrail is validated and possibly rewritten by its own guardrail.
The params (rewritten or not) are put together to create the materialized complete user message.
The materialize user message is sent to the InputGuardrail (if any) for a further validation and rewriting.
The resulting materialized message finally hits the LLM.

What do you think?

lordofthejars commented 1 week ago

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

mariofusco commented 1 week ago

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

Ok, if so I suggest to review and eventually merge this pull request, and then we could introduce the guardrails for params with a second pull request.

mariofusco commented 6 days ago

Any news or comments on this? Is this a feature that we want? Or maybe we should only develop input guardrails working on user params as suggested by @lordofthejars ?

/cc @cescoffier @geoand

sberyozkin commented 5 days ago

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

mariofusco commented 5 days ago

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

Good question and indeed at the beginning I was also struggling to find a valid use case for input rewriting and I didn't implement this. Subsequently I discussed this with @lordofthejars and he pointed out that there could be situations where you may want to rewrite the user input for instance for anonymization purposes: e.g. you rewrite the user prompt by replacing people/companies/products names with placeholders in order to avoid leaking sensitive information to the LLM service provider.

lordofthejars commented 5 days ago

I agree that rewriting input can have side effects, but I think that first, as the developer of the app, you know how you are rewriting, so maybe it is not that critical.

Of course, you can always implement this change before invoking the LLM,but the good thing about guards is that you can have all of them in a place, and not having logic spread across different places

sberyozkin commented 5 days ago

Thanks, minimizing the risk of the sensitive input content being leaked into LLM is a good use case. I'd probably consider reporting a message to the user instead, Please retry, the sensitivity score is too high, but I can imagine how anonymization can be a useful mechanism as well

lordofthejars commented 5 days ago

Yeah also removing weird chars is an option, that yes it can be done as you mentioned as a sanitize method, but if we can put everything in one place, why not. But of course, there are workarounds for sure to not having to rewrite the prompt.

geoand commented 4 days ago

The way I see it, rewriting the user query is fine.

quarkiverse / quarkus-langchain4j

Allow rewriting of user messages from input guardrails #1083

Status for workflow `Build (on pull request)`

quarkiverse / quarkus-langchain4j

Allow rewriting of user messages from input guardrails #1083

Status for workflow Build (on pull request)

Status for workflow `Build (on pull request)`