Currently, CQL client in YugaByte Jepsen tests treats NoHostAvailableException as definitely failed. But it Java driver can throw this exception with a list of hosts tried (for example: NoHostAvailableException: All host(s) tried for query failed (tried: n2/192.168.122.12:9042 (com.datastax.driver.core.exceptions.TransportException: [n2/192.168.122.12:9042] Connection has been closed)) in following scenario:
The client sends write to YugaByte CQL Proxy (node n2).
YugaByte CQL proxy (n2) sends write RPC to the tablet server leader (node n4).
Tablet server hosting CQL Proxy (n2) quickly got killed (in this case by testing framework) right after sending write RPC.
Tablet server on n4 successfully processes write and update DB state.
That means NoHostAvailableException with a list of hosts should be treated as :info (unknown) instead of :fail (definitely failed).
We should only treat NoHostAvailableException as :fail when it has no host was tried in error message.
Currently, CQL client in YugaByte Jepsen tests treats NoHostAvailableException as definitely failed. But it Java driver can throw this exception with a list of hosts tried (for example:
NoHostAvailableException: All host(s) tried for query failed (tried: n2/192.168.122.12:9042 (com.datastax.driver.core.exceptions.TransportException: [n2/192.168.122.12:9042] Connection has been closed))
in following scenario:That means
NoHostAvailableException
with a list of hosts should be treated as:info
(unknown) instead of:fail
(definitely failed). We should only treatNoHostAvailableException
as:fail
when it hasno host was tried
in error message.