Closed dimvlamis closed 9 months ago
Hi @dimvlamis streaming is enabled, the initial delay you see on first question is the time needed for super Ollama to load your model in memory.
Hi,
This is not true. None of the answers show up with streaming with any model. I have run this many times and i always see a huge delay and then the whole message shows up.
Hi,
This is not true. None of the answers show up with streaming with any model. I have run this many times and i always see a huge delay and then the whole message shows up.
Can confirm - this is not true. Streaming not supported.
Please be aware that first time you select a model there is a delay due to loading it in memory, after that everything is faster. Here a video showing this using Ollama directly.
https://github.com/ivanfioravanti/chatbot-ollama/assets/1069210/5d2a4961-b606-4223-8a09-65aa1539d9c7
Please try directly with Ollama using something like:
ollama run openchat "How are you?"
Streaming is the code, you can check by yourself.
Hi,
This is not true. None of the answers show up with streaming with any model. I have run this many times and i always see a huge delay and then the whole message shows up.
Have you tried same with Ollama directly?
Please be aware that first time you select a model there is a delay due to loading it in memory, after that everything is faster.
I apologize for the length and quality of the video. My computer isn't the best. Running on CPU inference, AVX2. I ran the ollama commands differently than you but it should still function the same way.
If you need me to re-record this, let me know.
Hi,
I noticed that there is no option to enable streaming. Is it planned to add the feature?
It would be much appreciated.