valyala / fasthttp

Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than net/http
MIT License
21.87k stars 1.76k forks source link

HostClient reuse got ErrTimeout #1598

Closed transfercai closed 1 year ago

transfercai commented 1 year ago

I reused HostClient like this

func (c *FastProxy) Do(req *fasthttp.Request, resp *fasthttp.Response, targetHost string) error {
    var hc *HC
    startCleaner := false
    v, ok := c.m.Load(targetHost)
    if ok {
        hc = v.(*HC) // reuse the exist client.
    } else {
        hc = &HC{
            client: &fasthttp.HostClient{
                Addr:            targetHost,
                MaxConns:        c.MaxConnsPerHost,
                MaxConnDuration: c.MaxConnDuration,
            },
        }
        c.m.Store(targetHost, hc)
        if c.m.Length() == 1 {
            startCleaner = true
        }
    }

    if startCleaner {
        go c.mCleaner()
    }

    hc.lastDo = c.timeNow()
    removeRequestHopHeaders(req)
    err := hc.client.DoTimeout(req, resp, c.Timeout)
    removeResponseHopHeaders(resp)
    return err
}

the c.MaxConnDuration is always 12 sec, the c.MaxConnsPerHost is always 5000, and the c.Timeout is always 10 sec. In a production environment, I encountered some ErrTimeout issues. the response time is less than 1 sec, but got a ErrTimeout msg, I guess it's a reuse of connect which is nearly closed, how can I fix this issue? thank you.

erikdubbelboer commented 1 year ago

Without a reproducible example I'm afraid I can't do much.

transfercai commented 1 year ago

I upgraded my fasthttp version from 1.19 to 1.42 in an attempt to fix this issue, but unfortunately, it seems to have caused a new problem. Specifically, on our production environment, we are now seeing more frequent occurrences of the error message "the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection." Despite attempting to use the req.setConnectionClose() method, this issue persists. What steps can I take to address this new problem?

transfercai commented 1 year ago

version 1.42 has solved this issue, but new problem occurs

erikdubbelboer commented 1 year ago

the server closed connection before returning the first response byte. means the server closes a keep-alive connection without sending a Connection: close header. Normally the client is the one that is supposed to close the connection to prevent a lot of TIME_WAIT sockets on the server side.

Are you also in control of the server or do you know why the server might close connections instead of using Connection: close?

transfercai commented 1 year ago

Thank you for your response. I tested adding a "Connection: close" header in the client request, but I noticed that the server's time_wait state kept increasing until I eventually ran out of valid file descriptors on the server.

transfercai commented 1 year ago

the server closed connection before returning the first response byte. means the server closes a keep-alive connection without sending a Connection: close header. Normally the client is the one that is supposed to close the connection to prevent a lot of TIME_WAIT sockets on the server side.

Are you also in control of the server or do you know why the server might close connections instead of using Connection: close?

I found that one of my downstream services is a Node.js service with a default keepalive time of 5 seconds, while my MaxConnDuration is set to 12 seconds. This may be causing the issue. To fix this, I added the parameter "MaxIdleConnDuration to 3s", but the problem still persists.

transfercai commented 1 year ago

I found that my downstream service SLB (nginx) reloads its configuration when a new service is released. During this time, new TCP connections are reset. then the error "the server closed connection before returning the first response byte" occurred. I increased the number of retries and set RetryIfFunc to always true to address this issue, which improved the situation compared to before. However, there are still some errors occurring.

erikdubbelboer commented 1 year ago

That's good to hear. How often are the errors still occurring? Roughly after how many requests (or time and average req/sec)?

transfercai commented 1 year ago

That's good to hear. How often are the errors still occurring? Roughly after how many requests (or time and average req/sec)? After making changes to my downstream load balancer (nginx), this issue no longer occurs. Thank you for all the responses to my question. Best regards.