pepperoni21 / ollama-rs

A Rust library allowing to interact with the Ollama API.
MIT License
542 stars 81 forks source link

Support for aborting requests #89

Open kj800x opened 4 days ago

kj800x commented 4 days ago

I am converting a script from nodejs to rust, and part of this involves creating a streaming generation request from ollama. In node, I was able to abort the request if the model was still generating after a timeout. I have an equivalent implementation in my Rust which causes the Rust task to stop listening to the stream, but since I can't abort the underlying HTTP request, I assume the ollama server will continue to generate indefinitely. This has been an issue since some of the models that I've used seem to have an issue where they continue to generate without ever stopping.

Would it be possible to add a way to abort the stream in a way that propogates back to the ollama server? It appears that aborted HTTP requests would cause the ollama server to stop generation: https://github.com/ollama/ollama/issues/2876

    let start = Instant::now();

    let ollama = Ollama::new(format!("http://{}", ollama_host), 11434);
    let mut stream = ollama
        .generate_stream(GenerationRequest::new(
            task.model.clone(),
            prompt_version.prompt.clone(),
        ))
        .await
        .unwrap();

    let mut response = "".to_owned();
    let mut stdout = tokio::io::stdout();
    while let Some(res) = stream.next().await {
        let responses = res.unwrap();

        for resp in responses {
            response.push_str(&resp.response);
            stdout.write(resp.response.as_bytes()).await.unwrap();
            stdout.flush().await.unwrap();
        }

        if Instant::now().duration_since(start) > Duration::from_millis(task.timeout) {
            log::warn!("Worker job timeout");
            // FIXME: We should kill the request, but the library doesn't provide a way to do this?
            // stream.abort(); // something like this would be nice
            response.push_str("\n\n[interrupted]");

            return Ok(OllamaResponse::Timeout { response });
        }
    }

    Ok(OllamaResponse::Success { response })