Open hra42 opened 1 day ago
hey Henry @hra42,
Was talking with @sam-at-pieces on this issue and we have both experienced this same issue, however it occurs rather infrequently.
I believe we should also bring @mack-at-pieces into this as well to take a deeper look into the PFD logs.
I was unable to reproduce this issue with the same input, however I will dig a bit deeper in the mean time.
Maybe for your information, I don't have that many pieces saved in that instance, currently only the 2 generated from the startup from #464.
From looking at the logs of PfD I can't figure out what happened, for me at looks like Pieces did exactly what was supposed to happen within the workflow. Maybe you can see more on your end.
Funny is that I received the error almost immediately, I would guess thet pieces does try to add some context to the message, either from onboarding or profile or something with vector db doesn't work properly so we receive more context then we should what makes the request fail. Hard to troubleshoot.
Thanks for raising this @hra42. The strange thing is that their should be no additional context at all added to this message and looking at what is being passed to the LLM it is ~1000 tokens so should easily fit in the context window. It's a tough one! I will take a deeper look tomorrow am. Would you be able to share what LLM you were using when you saw these? Was it Claude by any chance?
@sam-at-pieces Yes, I used Claude 3.5 Sonnet.
Great so looking into this and chatting with @mack-at-pieces, we throw this error when the stream completes and the output is empty on the pfd side.
I've confirmed that they empty outputs are returned from the vertex AOT For Claude + Gemini when would expect an error code (I've sent too many tokens, incorrect parameters etc.). I suspect that the error is being suppressed somewhere in the AOT and an empty prediction is being returned. The following issues can cause this: 1) When harmful content is detected (notorious number of false positives here) 2) When there is an internal error on the google side 3) When services are temporarily unavailable 4) when incorrect parameters are passed 5) When the message content is too long
I've confirmed that in production we will never pass a message that is too long to the LLM so this response is likely a result of one of the first four cases.
@mack-at-pieces I would recommend that we change the error message to reflect this to avoid confusion. We should prompt the user to switch models or retry (recommended in the vertex AI docs). I'm thinking something like this.
@mark-at-pieces are we repressing some of these errors on the AOT side? We should be getting 500s back for exceeding the prompt length but I always get a completed request, just with an empty prediction. If we surface these I can automate retrying.
@sam-at-pieces outstanding work! Thank you so much for diving so deep into this error!
If you need a contact person on GCP, please let me know, I might be able to help with this.
Software
Desktop Application
Operating System / Platform
macOS
Your Pieces OS Version
10.1.15
Early Access Program
Kindly describe the bug and include as much detail as possible on what you were doing so we can reproduce the bug.
I wanted to demo the power of pieces using the day one challenge of advent of code.
I received an error that the messages exceeded the context:
I tried in a new chat and is started working as expected:
Logs: pOS: log-12032024.txt PfD: log-12032024.txt
Here is the complete message for reproduction: