How to implement a request timeout?

dezoito commented 7 months ago

I was thinking about adding the possibility of adding a timeout param for completion (and possibly chat) calls.

Something like:

    /// Completion generation with a single response.
    /// Returns a single `GenerationResponse` object
    pub async fn generate(
        &self,
        request: GenerationRequest,
        timeout: Option<u64>,
    ) -> Result<GenerationResponse, String> {
        let mut request = request;
        request.stream = false;

        let uri = format!("{}/api/generate", self.uri());
        let serialized = serde_json::to_string(&request).map_err(|e| e.to_string())?;
        let res = self
            .reqwest_client
            .post(uri)
            .body(serialized)
            .timeout(Duration::from_secs(timeout.unwrap_or(u64::MAX)))
            .send()
            .await
            .map_err(|e| e.to_string())?;

        if !res.status().is_success() {
            return Err(res.text().await.unwrap_or_else(|e| e.to_string()));
        }

        let res = res.bytes().await.map_err(|e| e.to_string())?;
        let res = serde_json::from_slice::<GenerationResponse>(&res).map_err(|e| e.to_string())?;

        Ok(res)
    }
}

I believe the timeout arg would have to be added to the generate() call, and not to the request struct (like with the other parameters and options) because this is not an Ollama parameter, but an option for the reqwest call.

The motivation is to allow clients to give up on requests that Ollama is taking to long to handle or has a request hung due to unexpected conditions.

I'm not sure if .timeout(Duration::from_secs(timeout.unwrap_or(u64::MAX))) is the correct way to handle the case when the timeout is NOT passed, either... this is just an idea.

Would appreciate some thoughts on this.

dezoito commented 7 months ago

A quick local implementation of the idea above works:

However, implementing it as is changes the function signature and makes the timeout parameter required (even if set to None).

That would introduce a breaking change, hence why I'm not submitting a PR at the moment... suggestions?

pepperoni21 commented 7 months ago

I'm not really sure how to integrate that, as I don't really want to change the functions signatures. It would be much easier if ollama had a parameter for that, this way we could just add it to the request. That might be something to suggest ollama add.

dezoito commented 7 months ago

Yeah, I gave it some more thought and think it would require coding some magic function overloading, and also that since this is not natively implemented by Ollama, there is a point for not having it in the client crate.

I might try to implement a wrapper in my Tauri app sometime.

Thank you again for your consideration!

dezoito commented 7 months ago

That might be something to suggest ollama add.

Done

https://github.com/ollama/ollama/issues/2764

Curiously, ollama-py implements this, but the timeout happens at their http client and is not passed to the ollama host either.

class BaseClient:
  def __init__(
    self,
    client,
    host: Optional[str] = None,
    follow_redirects: bool = True,
    timeout: Any = None,
    **kwargs,
  ) -> None:
    """
    Creates a httpx client. Default parameters are the same as those defined in httpx
    except for the following:
    - `follow_redirects`: True
    - `timeout`: None
    `kwargs` are passed to the httpx client.
    """

scd31 commented 5 months ago

Personally I think this is a lot cleaner to do on the client, like ollama-py does. It prevents a bunch of failure modes, such as a misconfigured reverse proxy, from causing ollama-rs to hang indefinitely.

dezoito commented 5 months ago

I've since implemented it on my client application's end.

pepperoni21 / ollama-rs

How to implement a request timeout? #29