Closed vadi2 closed 5 months ago
Hey @vadi2! Does it work if you curl
the ollama API directly? Do you mind posting a log of a curl with streaming?
It is not very quick, but it does work:
% curl http://localhost:11434/api/generate -d '{
"model": "mixtral",
"prompt":"Why is the sky blue?"
}'
{"model":"mixtral","created_at":"2024-03-13T14:45:38.576943Z","response":" The","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:40.338117Z","response":" phenomenon","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:41.347848Z","response":" that","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:42.764003Z","response":" causes","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:43.827466Z","response":" the","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:44.360891Z","response":" sky","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:45.628454Z","response":" to","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:46.702922Z","response":" appear","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:47.886659Z","response":" blue","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:49.032583Z","response":" is","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:49.723107Z","response":" called","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:50.541659Z","response":" Ray","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:52.79631Z","response":"le","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:53.718047Z","response":"igh","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:54.969805Z","response":" scattering","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:56.066794Z","response":".","done":false}
{"model":"mixtral","created_at":"2024-03-13T14:45:57.470801Z","response":" As","done":false}
Can you try Cody again with the enhanced context disabled (the ✨ icon next to the input)? This model does look very slow indeed especially since your prompt is only 6 tokens and we likely generate thousands of tokens for the prompt.
This is running ollama on a macbook pro. I'll try hosting it on an nvidia gpu instead.
@vadi2 Great! My point is that I do think this is not related to Cody but instead a fact of a prompt that is too complex for this model. Disabling enhanced context helps since it will reduce the prompt size but ideally you want to try a model that can ingest tokens at a faster rate (maybe there are quantized versions of Mixtral that you can use, I personally haven't tried mixtral locally though)
I missed that in the rush - disabling enchanted context did help and it started providing an answer.
This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.
Version
v1.9.1710263337 (pre-release)
Describe the bug
Ollama chat response never shows:
https://github.com/sourcegraph/cody/assets/110988/710ac909-7605-4e8b-b3b7-fff562597605
Expected behavior
The response can be seen.
Additional context
No response