valyala / fasthttp

Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than net/http
MIT License
21.91k stars 1.76k forks source link

The test named TestHostClientMaxConnWaitTimeoutError fails under certain conditions. #1830

Closed newacorn closed 3 months ago

newacorn commented 3 months ago

https://github.com/valyala/fasthttp/blob/a1db411cc28c8214776af8b6a62f4e60c9d2fea8/client_test.go#L2871

Reproduce

❯ go test -run=TestHostClientMaxConnWaitTimeoutError -count=2000  -failfast  .
--- FAIL: TestHostClientMaxConnWaitTimeoutError (0.21s)
    client_test.go:3021: connsWait has 1 items remaining
FAIL
FAIL    github.com/valyala/fasthttp 24.859s
FAIL

Cause

https://github.com/valyala/fasthttp/blob/a1db411cc28c8214776af8b6a62f4e60c9d2fea8/client.go#L1692-L1704

https://github.com/valyala/fasthttp/blob/a1db411cc28c8214776af8b6a62f4e60c9d2fea8/client.go#L1747-L1758 These two pieces of code assume that when w.waiting() returns true, we can definitely wake up a valid waiter, and then have that waiter continue to wake up the next one upon exiting.

However, there is a window of time where w.waiting() may return true, but w could time out and set the value of wantConn.err afterward, causing the wake-up chain to be interrupted.

The more serious issue is not whether c.connsWait.len() returns zero after all client calls return, but rather that there are errors in the length of c.conns and changes in the c.connsCount count. This can lead to a situation where, during a connsCount semaphore release, there are still valid waiters in c.connsWait.

Issue 1: The semaphore released for connsCount does not increase the length of c.conns nor does it decrement the c.connsCount count. Issue 2: Even if the release is reflected in the length of c.conns or the calculation of c.connsCount, the problem lies in that it should have passed the semaphore but instead chose to release it. Let's first address the connsCount semaphore issue. Next, resolve the issue with the TestHostClientMaxConnWaitTimeoutError test case.

erikdubbelboer commented 3 months ago

Thanks for diving into this, and writing two pull requests to fix both issues!