If you submit a chat and press the stop button, Ollamac doesn't stop the Ollama from streaming the response, it just stop updating the UI.
This is bad in general, but particularly bad when the model start to loop into garbage. It may continue to use 100% of GPU for several minutes. The Ollama built-in terminal chat, resets the tcp/http connection of the response streaming when ctrl+C is pressed and I assume the Ollama server stop the inference at that point.
Ollamac should reset the chat streaming tcp/http connection when the stop button is pressed.
This can be easily validated using Wireshark to contrast the two behaviours by looking at the traffic in and out of port 11434 on the local adapter.
If you submit a chat and press the stop button, Ollamac doesn't stop the Ollama from streaming the response, it just stop updating the UI.
This is bad in general, but particularly bad when the model start to loop into garbage. It may continue to use 100% of GPU for several minutes. The Ollama built-in terminal chat, resets the tcp/http connection of the response streaming when ctrl+C is pressed and I assume the Ollama server stop the inference at that point.
Ollamac should reset the chat streaming tcp/http connection when the stop button is pressed.
This can be easily validated using Wireshark to contrast the two behaviours by looking at the traffic in and out of port 11434 on the local adapter.