Closed chillatom closed 2 months ago
In my experience, this happens if you give the LLM not enough room to answer fully. For example, if you constrain an LLM to only 256 output tokens, it will mostly provide only a partial answer if code is involved. Asking Claude for a full implementation of the SURF (Speeded-Up Robust Features) algorithm takes about 1302 tokens.
Cody will of course cut partially. If you want to stay within a token limit, then prepend a hidden prompt to the users' prompt to only answer briefly. Or if it hits some special letters or a stop_token this could happen too.
This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.
Version
N/A
Describe the bug
From slack From discord
Let's investigate and roll any remediation into the output sequence epic
Reports that Opus is terminating without completing the "thought" fully.
Expected behavior
Cody gracefully handles early terminations or allows user to continue chat
Additional context
No response