socketry / async-redis

MIT License
83 stars 18 forks source link

The connection pool returns connections that are on CLOSE_WAIT state #23

Closed davidor closed 4 years ago

davidor commented 4 years ago

I have noticed that https://github.com/socketry/async-redis/blob/b00872ae79b3a51ddedf1ad0e4ffc811cf9f032f/lib/async/redis/pool.rb#L114 can return connections that are on CLOSE_WAIT . When that happens, async-redis raises Errno::EPIPE or EOFError when trying to make a request using that connection. The request is not retried, and the connection is not removed from the pool either.

I tried this in a setup with HAproxy that sits between async-redis and the Redis server, but any other proxy would work. Proxies normally set a timeout of a few seconds. Consider this scenario:

From what I've seen, async-redis will keep trying using that connection. Async-redis never closes those connections and does not remove them from the pool either.

I wanted to discuss possible solutions before trying to write a fix. I think async-redis should retry requests that fail with errors like Errno::EPIPE. So when async-redis makes a request, it could rescue from those, close the connection, remove it from the pool, and retry the request with another connection of the pool. redis-rb has a reconnection_attempts parameter for this https://github.com/redis/redis-rb/blob/master/README.md#reconnections Of course, we'd need to ensure that we retry only on errors that guarantee that we will not run any duplicate commands.

I'm not sure if this could be solved in another library of the async family. Maybe async-io, or async? I'm not familiar enough with those to know whether they could guarantee that all the connections that async-redis sees on its pool are valid. What do you think @ioquatix ?

ioquatix commented 4 years ago

Thanks for the detailed report. If a connection fails due to EPIPE or some other connection failure it should be retired and the entire transaction restarted. In the case of single command we could certainly retry it but that could introduce other issues for cases where the connection fails while reading the response even thought the command was received...

ioquatix commented 4 years ago

Async::IO::Socket has a connected? method which can be used to detect sockets that are definitely disconnected. Can be used by pool to avoid acquiring obviously borked connections.