bug: ollama: wrong response parsing

sergei-dyshel commented 6 months ago

Version

v1.16.0

Describe the bug

When running against remote Ollama instance (model deepseek-coder:6.7b), code completion doesn't work and I see the following errors in Cody log:

█ CompletionLogger:onError: {"type":"code-completion","endpoint":"http://34.254.171.114:11434/api/generate","status":"error","duration":1007,"err":"Unexpected end of JSON input"} 
█ getInlineCompletions:error: Unexpected end of JSON input SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at Object.complete (/Users/sergei/.vscode/extensions/lib/shared/src/llm-providers/ollama/completions-client.ts:71:39)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at generatorWithTimeout (/Users/sergei/.vscode/extensions/sourcegraph.cody-ai-1.16.0/src/completions/utils.ts:84:37)
    at fetchAndProcessDynamicMultilineCompletions (/Users/sergei/.vscode/extensions/sourcegraph.cody-ai-1.16.0/src/completions/providers/fetch-and-process-completions.ts:91:47)
    at async Promise.all (index 0)
    at zipGenerators (/Users/sergei/.vscode/extensions/sourcegraph.cody-ai-1.16.0/src/completions/utils.ts:34:21)
    at generateCompletions (/Users/sergei/.vscode/extensions/sourcegraph.cody-ai-1.16.0/src/completions/request-manager.ts:118:34)

Expected behavior

Completion should work.

Additional context

The relevant code is in https://github.com/sourcegraph/cody/blob/c9ee1119b73eb1b9724553f3a103c35332e84a54/lib/shared/src/llm-providers/ollama/completions-client.ts#L57-L72.

I've added a breakpoint in JSON.parse(chunkString) and see that parsing fails on chunkString which looks like:

'{"model":"deepseek-coder:6.7b","created_at":"2024-05-01T22:05:12.849552267Z","response":"","done":true,"context":[32016,553,207,16277,25,426,1777,8188,14,7364,14,275,2294,14,275,2294,12,82,1212,12,13335,12,13490,13,14789,185,474,477,185,185,655,8318,62,4968,62,29939,7,1732,46,4208,477,185,655,967,26,185,294,92,185,185,436,23272,4580,9,3812,62,4895,474,1412,9,260,26,185,185,436,562,334,7060,62,4895,12,29,703,62,82,1212,62,13335,822,2069,28,1070,82,1212,62,13335,8,507,185,655,3812,62,4895,12,29,1113,62,82,1212,62,13335,7,14589,477,185,294,92,185,185,436,4716,11604,26846,62,7060,62,2139,8,507,185,655,1452,375,1369,6859,1369,62,1743,21826,3996,25,507,185,185,474,553,185,474,553,1271,394,417,3207,245,11322,12,14314,387,930,437,2445,344,254,3812,438,948,3735,276,254,3812,12,2922,285,185,474,553,344,254,252,1212,12,13335,2010,317,11357,441,8155,279,2606,13,185,474,553,185,185,1044,4409,7,7060,62,4895,12,29,703,62,82,1212,62,13335,822,2312,4579,477,185,185,474,553,185,474,553,428,9609,11322,12,14314,387,10877,4704,429,254,2315,10115,331,254,252,1212,13,207,1271,741,438,245,252,1212,11896,185,474,553,930,519,10115,540,3964,2082,13,207,1271,254,3812,6267,279,931,457,437,5627,409,1903,5627,331,185,474,553,437,1315,930,741,317,637,2796,276,16027,3192,409,441,254,3812,317,2082,1952,394,2561,1001,13,185,474,553,185,185,1044,562,11604,82,1212,62,13335,12,29,262,62,7060,62,246,62,2346,9770,7060,62,13764,62,9668,1435,507,185,79'

i.e. it's only partial JSON and is not parseable.

Now, I can't understand how the code is supposed to work. iterableBody is async stream reader from response.body and by iterating async over it with:

for await (const chunk of iterableBody)

we will asynchronously get parts of the response body but there is no guarantee that each of them will be valid JSON! I see that we split each chunk to lines and parse each line as JSON but what if async chunk will be only part of line? That's what happened here IMO. What am I missing?

IMO The correct way would be to concatenate all the chunks and only then parse the resulting string, or at least detect cut-in-middle chunks. I tried a dirty fix in https://github.com/sourcegraph/cody/commit/143766151e7e043baed639048acb23c28a271704 and it worked.

Tagging @valerybugakov as the author of the above code (according to git blame).

harshal-cuminai commented 6 months ago

i get same error a lot and cody is basically useless in providing any suggestions when using remote ollama instance.

I used codallama:70b-code

@valerybugakov any updates on this ?

valerybugakov commented 6 months ago

Hey @sergei-dyshel, thanks for the detailed issue description and debugging information!

In cases like this, we should detect cut-in-middle chunks and continue the processing loop. Collecting all the chunks is inefficient because we can often cut the completion generated by the LLM early, reducing the overall latency.

I can look into it this week to fix it in the next release.

valerybugakov commented 6 months ago

Hey @sergei-dyshel and @harshal-cuminai. I'm fixing the issue here. Let me know if the problem does not go away for you.

sergei-dyshel commented 6 months ago

@valerybugakov Just curious, when you tested that pull request, did you specifically saw a case where JSON response was split in the middle (in multiple chunks) and was processed correctly? I'm asking because I still don't fully understand the logic even after the fix, so I've left a comment on PR here https://github.com/sourcegraph/cody/pull/4066#discussion_r1594533916.

valerybugakov commented 6 months ago

@sergei-dyshel, here's the update based on your comment: https://github.com/sourcegraph/cody/pull/4103 I wasn't able to repro the cut-in-the-middle chunks locally. Could you share the minimal repro steps?

sourcegraph / cody