If hostnames are used instead of ip addresses for all hosts within a pool,
and dns lookup fails, then get/multiget will hang indefinitely instead of
responding with an error such as SERVER_ERROR Host is down.
(I expect a client would detect this and close the connection,
but this is not ideal, the client timeout could be a second)
Both redis and memcache protocols are affected
If a server resolves but is down, then get does respond with SERVER_ERROR Host is down.
I suspect that's because memcached get is implemented to coalesce responses
from multiple backend servers, even when there's only one key,
and this is likely a bug specific to handling coalescing when
there's no attempt to send the request to a backend server
Because connection attempts are async but dns lookup is asynchronous,
there's a non-null server connection for a host that's unavailable
but a null server connection for a host that has a dns lookup error,
and these end up using different code paths.
(e.g. the former will call server_close() which does adjust nfrag_sent)
Fixes #596
Solution
Increment the fragmented messages processed (for the associated request) in the case that no server connection was established for a given key, as well.
This will ensure that the request completes and the connection from the client is properly aborted instead of hanging
when there is a dns error preventing the creation of a connection.
If hostnames are used instead of ip addresses for all hosts within a pool, and dns lookup fails, then get/multiget will hang indefinitely instead of responding with an error such as
SERVER_ERROR Host is down
. (I expect a client would detect this and close the connection, but this is not ideal, the client timeout could be a second)get
does respond withSERVER_ERROR Host is down
.I suspect that's because memcached get is implemented to coalesce responses from multiple backend servers, even when there's only one key, and this is likely a bug specific to handling coalescing when there's no attempt to send the request to a backend server
Because connection attempts are async but dns lookup is asynchronous, there's a non-null server connection for a host that's unavailable but a null server connection for a host that has a dns lookup error, and these end up using different code paths. (e.g. the former will call server_close() which does adjust nfrag_sent)
Fixes #596
Solution
Increment the fragmented messages processed (for the associated request) in the case that no server connection was established for a given key, as well. This will ensure that the request completes and the connection from the client is properly aborted instead of hanging when there is a dns error preventing the creation of a connection.