[instrument(skip(self))]

pub async fn health(&mut self) -> Result<HealthResponse> {
    let futures: Vec<_> = self
        .clients
        .iter_mut()
        .map(|client| client.health())
        .collect();
    join_all(futures).await.pop().unwrap()
}`

`/// Returns a client connected to the given url pub async fn connect(uri: Uri) -> Result { let channel = Channel::builder(uri).connect().await?;

    Ok(Self {
        stub: TextGenerationServiceClient::new(channel),
    })
}

/// Returns a client connected to the given unix socket
pub async fn connect_uds(path: String) -> Result<Self> {
    let channel = Channel::from_shared("http://[::]:50051".to_string())
        .unwrap()
        .connect_with_connector(tower::service_fn(move |_: Uri| {
            tokio::net::UnixStream::connect(path.clone())
        }))
        .await?;

    Ok(Self {
        stub: TextGenerationServiceClient::new(channel),
    })
}`

这一部分在调用gprc的时候，返回结果会很慢，尤其是在调用一个超长文本，比如125k的长上下文的时候，我使用的是llama3-8B。 /health 时间会超过10s以上。这个已经严重影响了正常使用。

Expected behavior

如题。

huggingface / text-generation-inference

The "/health" is so slow when generating extra-long text。 #2348

System Info

Information

Tasks

Reproduction

[instrument(skip(self))]

Expected behavior