guardrails-ai / guardrails

Adding guardrails to large language models.
https://www.guardrailsai.com/docs
Apache License 2.0
4.03k stars 305 forks source link

[feat] Add support for impure raw structured outputs when streaming #542

Closed thekaranacharya closed 1 month ago

thekaranacharya commented 9 months ago

Description Streaming the raw and validated outputs is super helpful and intuitive and users don't need to wait for the entire response to be generated and then kick in Guardarils' validation. Guardrails supports streaming the raw and validated outputs for OpenAI callables and any custom LLM wrapper callables. It expects the callables to return a generator (of chunks) when stream is set to True in a guard call.

For validating structured (JSON) outputs, it works on the crucial assumption that the raw model output only contains pure JSON that guardrails can validate. If the raw output contains any accompanying text, parsing and validation will fail. (This is a documented limitation in the original streaming PR).

e.g. Raw output with pure JSON:

{
"name": "John Doe",
"age": 39
}

In the above case, the streaming validation will work.

e.g. Raw output with accompanying text:

Here is a valid JSON object you requested:
{
"name": "John Doe",
"age": 39
}

In this case, it will fail.

Why is this needed

Rough implementation details Currently, the entire workflow for streaming goes through StreamRunner. Just add 2 simple checks before parsing and validation kicks in:

CalebCourier commented 9 months ago

Adding convo from Slack:

@thekaranacharya:

@Caleb Courier in regards to this Discord message: https://discord.com/channels/1085077079697150023/1197022557870772284/1197188300948111391 This works due to this function right? Or is there somewhere else we look for ``json - and potentially there we could also look for the {?

@CalebCourier

correct, that's where it all happens

@thekaranacharya

Okay that func calls this function; which finds code within the opening and closing tags right - and for that we need to have the entire output? I'm wondering we would need to change that a bit for a) Also include { b) Support streaming. Currently in streaming we don't utilise this and extract_json_from_ouput at all.

@CalebCourier

Yeah I think there's at least one new function for extracting content using { as the border, and a new arg to determine if a partial block is useful based on whether or not we're streaming

@thekaranacharya

For b, we could create a new func that: Checks each chunk and simply waits for the opening tags (optionally alongwith the codetype) OR { and then starts processing the outputs And also keeps checking for } or closing tags and then exits

a new arg to determine if a partial block is useful based on whether or not we're streaming

We probably won't require this as when streaming, the entire workflow goes through StreamRunner and we can just add a new func there before parsing and validation. Currently when streaming we just don't check for json extraction, just runs on assumption that only pure JSON will be generated by the model

@CalebCourier

Oh so it just waits for everything before parsing? If that's the case then yeah we won't need it, but it we want to parse on each cumulative chunk we would

@thekaranacharya

No no we don't wait at all. It just skips extracting json like traditionally in Runner. We simply assume that the output is pure JSON. Ik it's not ideal - just opening an issue on Github regarding this discussion

https://github.com/guardrails-ai/guardrails/issues/542

CalebCourier commented 9 months ago

Related to: https://github.com/guardrails-ai/guardrails/issues/543

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 14 days with no activity.