Closed sergei-dyshel closed 6 months ago
i get same error a lot and cody is basically useless in providing any suggestions when using remote ollama instance.
I used codallama:70b-code
@valerybugakov any updates on this ?
Hey @sergei-dyshel, thanks for the detailed issue description and debugging information!
In cases like this, we should detect cut-in-middle chunks and continue the processing loop. Collecting all the chunks is inefficient because we can often cut the completion generated by the LLM early, reducing the overall latency.
I can look into it this week to fix it in the next release.
Hey @sergei-dyshel and @harshal-cuminai. I'm fixing the issue here. Let me know if the problem does not go away for you.
@valerybugakov Just curious, when you tested that pull request, did you specifically saw a case where JSON response was split in the middle (in multiple chunks) and was processed correctly? I'm asking because I still don't fully understand the logic even after the fix, so I've left a comment on PR here https://github.com/sourcegraph/cody/pull/4066#discussion_r1594533916.
@sergei-dyshel, here's the update based on your comment: https://github.com/sourcegraph/cody/pull/4103 I wasn't able to repro the cut-in-the-middle chunks locally. Could you share the minimal repro steps?
Version
v1.16.0
Describe the bug
When running against remote Ollama instance (model
deepseek-coder:6.7b
), code completion doesn't work and I see the following errors in Cody log:Expected behavior
Completion should work.
Additional context
The relevant code is in https://github.com/sourcegraph/cody/blob/c9ee1119b73eb1b9724553f3a103c35332e84a54/lib/shared/src/llm-providers/ollama/completions-client.ts#L57-L72.
I've added a breakpoint in
JSON.parse(chunkString)
and see that parsing fails onchunkString
which looks like:i.e. it's only partial JSON and is not parseable.
Now, I can't understand how the code is supposed to work.
iterableBody
is async stream reader fromresponse.body
and by iterating async over it with:we will asynchronously get parts of the response body but there is no guarantee that each of them will be valid JSON! I see that we split each chunk to lines and parse each line as JSON but what if async chunk will be only part of line? That's what happened here IMO. What am I missing?
IMO The correct way would be to concatenate all the chunks and only then parse the resulting string, or at least detect cut-in-middle chunks. I tried a dirty fix in https://github.com/sourcegraph/cody/commit/143766151e7e043baed639048acb23c28a271704 and it worked.
Tagging @valerybugakov as the author of the above code (according to git blame).