Closed thekaranacharya closed 1 month ago
Adding convo from Slack:
@thekaranacharya:
@Caleb Courier in regards to this Discord message: https://discord.com/channels/1085077079697150023/1197022557870772284/1197188300948111391 This works due to this function right? Or is there somewhere else we look for ``json - and potentially there we could also look for the {?
@CalebCourier
correct, that's where it all happens
@thekaranacharya
Okay that func calls this function; which finds code within the opening and closing tags right - and for that we need to have the entire output? I'm wondering we would need to change that a bit for a) Also include { b) Support streaming. Currently in streaming we don't utilise this and extract_json_from_ouput at all.
@CalebCourier
Yeah I think there's at least one new function for extracting content using { as the border, and a new arg to determine if a partial block is useful based on whether or not we're streaming
@thekaranacharya
For b, we could create a new func that: Checks each chunk and simply waits for the opening tags (optionally alongwith the codetype) OR { and then starts processing the outputs And also keeps checking for } or closing tags and then exits
a new arg to determine if a partial block is useful based on whether or not we're streaming
We probably won't require this as when streaming, the entire workflow goes through StreamRunner and we can just add a new func there before parsing and validation. Currently when streaming we just don't check for json extraction, just runs on assumption that only pure JSON will be generated by the model
@CalebCourier
Oh so it just waits for everything before parsing? If that's the case then yeah we won't need it, but it we want to parse on each cumulative chunk we would
@thekaranacharya
No no we don't wait at all. It just skips extracting json like traditionally in Runner. We simply assume that the output is pure JSON. Ik it's not ideal - just opening an issue on Github regarding this discussion
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stalled for 14 days with no activity.
Description Streaming the raw and validated outputs is super helpful and intuitive and users don't need to wait for the entire response to be generated and then kick in Guardarils' validation. Guardrails supports streaming the raw and validated outputs for OpenAI callables and any custom LLM wrapper callables. It expects the callables to return a generator (of chunks) when
stream
is set toTrue
in aguard
call.For validating structured (JSON) outputs, it works on the crucial assumption that the raw model output only contains pure JSON that guardrails can validate. If the raw output contains any accompanying text, parsing and validation will fail. (This is a documented limitation in the original streaming PR).
e.g. Raw output with pure JSON:
In the above case, the streaming validation will work.
e.g. Raw output with accompanying text:
In this case, it will fail.
Why is this needed
gpt-3.5-turbo
,gpt-4-turbo
andtext-davinci-003
(now shut off) produced pure JSON output when prompted with detailed instructions. Other OSS models with the same prompt and instructions did not.Rough implementation details Currently, the entire workflow for streaming goes through
StreamRunner
. Just add 2 simple checks before parsing and validation kicks in:{
. Ignore all previous chunks before this.}
or closing tags, and ignore any future chunks.