I am converting a script from nodejs to rust, and part of this involves creating a streaming generation request from ollama. In node, I was able to abort the request if the model was still generating after a timeout. I have an equivalent implementation in my Rust which causes the Rust task to stop listening to the stream, but since I can't abort the underlying HTTP request, I assume the ollama server will continue to generate indefinitely. This has been an issue since some of the models that I've used seem to have an issue where they continue to generate without ever stopping.
Would it be possible to add a way to abort the stream in a way that propogates back to the ollama server? It appears that aborted HTTP requests would cause the ollama server to stop generation: https://github.com/ollama/ollama/issues/2876
let start = Instant::now();
let ollama = Ollama::new(format!("http://{}", ollama_host), 11434);
let mut stream = ollama
.generate_stream(GenerationRequest::new(
task.model.clone(),
prompt_version.prompt.clone(),
))
.await
.unwrap();
let mut response = "".to_owned();
let mut stdout = tokio::io::stdout();
while let Some(res) = stream.next().await {
let responses = res.unwrap();
for resp in responses {
response.push_str(&resp.response);
stdout.write(resp.response.as_bytes()).await.unwrap();
stdout.flush().await.unwrap();
}
if Instant::now().duration_since(start) > Duration::from_millis(task.timeout) {
log::warn!("Worker job timeout");
// FIXME: We should kill the request, but the library doesn't provide a way to do this?
// stream.abort(); // something like this would be nice
response.push_str("\n\n[interrupted]");
return Ok(OllamaResponse::Timeout { response });
}
}
Ok(OllamaResponse::Success { response })
I am converting a script from nodejs to rust, and part of this involves creating a streaming generation request from ollama. In node, I was able to abort the request if the model was still generating after a timeout. I have an equivalent implementation in my Rust which causes the Rust task to stop listening to the stream, but since I can't abort the underlying HTTP request, I assume the ollama server will continue to generate indefinitely. This has been an issue since some of the models that I've used seem to have an issue where they continue to generate without ever stopping.
Would it be possible to add a way to abort the stream in a way that propogates back to the ollama server? It appears that aborted HTTP requests would cause the ollama server to stop generation: https://github.com/ollama/ollama/issues/2876