Open nking-1 opened 3 months ago
This looks suspiciously like other failures I've seen with remote endpoints. I added more logging in #879 to try to track down exactly where the grammar failure was occurring and the problem seems to have disappeared. This is making me suspect that we may have a race condition on GrammarlessEngine
, but if so, I've not been able to track it down. The 'obvious' place of when the thread handling the actual HTTPS call is restarted looks OK to me.
After doing some more debugging I found that I can repro this much more often when using a higher temperature:
lm += gen("test1", max_tokens=100, temperature=0.8)
This can repro the bug with only 1 call to gen()
The bug also seems to repro more often with a longer prompt and higher max_tokens
The bug When prompting Gemini with a simple
gen()
call, sometimes the prompt fails with a ConstraintException. This is likely the same issue as reported in https://github.com/guidance-ai/guidance/issues/866To Reproduce This can be reproduced using the
test_gemini_pro
function in thetest_googleai.py
test file. You might have to run the code several times to reproduce the issue. Here's my slightly modified script:I've made some progress in debugging this but I'm missing some knowledge about the internals of Guidance to make the fix myself. Here's what I know so far.
When using temperature 0 (as in the code above) sometimes models will not return exactly the same string. I think that's the root cause of the issue. The second gen() call to the assistant fails:
It fails in
get_logits()
of_grammarless.py
here:That calls
_report_failed_match()
which runs some code to determine where there was a mismatch in the generation. Here is a dump of the local variables at the end of that function:System info (please complete the following information):
guidance.__version__
): 0.1.15