bug: Ollama chat response never comes through

vadi2 commented 8 months ago

Version

v1.9.1710263337 (pre-release)

Describe the bug

Ollama chat response never shows:

https://github.com/sourcegraph/cody/assets/110988/710ac909-7605-4e8b-b3b7-fff562597605

Expected behavior

The response can be seen.

Additional context

No response

philipp-spiess commented 8 months ago

Hey @vadi2! Does it work if you curl the ollama API directly? Do you mind posting a log of a curl with streaming?

vadi2 commented 8 months ago

It is not very quick, but it does work:

% curl http://localhost:11434/api/generate -d '{
  "model": "mixtral",
  "prompt":"Why is the sky blue?"
}'
{"model":"mixtral","created_at":"2024-03-13T14:45:38.576943Z","response":" The","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:40.338117Z","response":" phenomenon","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:41.347848Z","response":" that","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:42.764003Z","response":" causes","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:43.827466Z","response":" the","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:44.360891Z","response":" sky","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:45.628454Z","response":" to","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:46.702922Z","response":" appear","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:47.886659Z","response":" blue","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:49.032583Z","response":" is","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:49.723107Z","response":" called","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:50.541659Z","response":" Ray","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:52.79631Z","response":"le","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:53.718047Z","response":"igh","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:54.969805Z","response":" scattering","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:56.066794Z","response":".","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:57.470801Z","response":" As","done":false}

philipp-spiess commented 8 months ago

Can you try Cody again with the enhanced context disabled (the ✨ icon next to the input)? This model does look very slow indeed especially since your prompt is only 6 tokens and we likely generate thousands of tokens for the prompt.

vadi2 commented 8 months ago

This is running ollama on a macbook pro. I'll try hosting it on an nvidia gpu instead.

philipp-spiess commented 8 months ago

@vadi2 Great! My point is that I do think this is not related to Cody but instead a fact of a prompt that is too complex for this model. Disabling enhanced context helps since it will reduce the prompt size but ideally you want to try a model that can ingest tokens at a faster rate (maybe there are quantized versions of Mixtral that you can use, I personally haven't tried mixtral locally though)

vadi2 commented 8 months ago

I missed that in the rush - disabling enchanted context did help and it started providing an answer.

github-actions[bot] commented 5 months ago

This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.

sourcegraph / cody