Open ghost opened 3 years ago
How does something like Hikari handle validity checks?
During validation Hikari temporarily changes the network timeout of the connection here.
Similarly, during validation a PostgresConnectionManager
from r2d2-postgres could temporarily change the network timeout of its tls_connector
. I will check whether the underlying types expose the necessary functionality.
I will check whether the underlying types expose the necessary functionality.
Unfortunately, they do not. In r2d2-postgres the MakeTlsConnect
wraps a tokio_postgres::Socket
, which wraps a tokio::net::TcpStream
, and neither has configurable timeouts.
The Javadoc for Connection::setNetworkTimeout
is kind of interesting in that it discusses the timeout only applying to reads and not writes. I guess in practice the outbound TCP buffer doesn't fill for a single query?
postgres
could definitely grow support for read/write timeouts via tokio-postgres
, though it is interesting to note that libpq
does not offer that at all from what I can tell. I imagine they might just advise cranking the TCP keepalive interval down and relying on that.
Thinking about this a bit more, it actually seems to me like TCP keepalive may be the better approach here compared to some kind of read/query timeout. It doesn't make assumptions about how long a query will legitimately take to run, and can detect a disconnect even while the connection is totally idle rather than only at the time of checkout.
My understanding is TCP keepalives depend on three parameters:
tokio::net::TcpStream
exposes set_keepalive(duration)
, which sets the connection’s keepalive time to duration
. I think it does not allow configuration of keepalive interval and keepalive probes, whose UNIX defaults are 75 seconds and 9, which would still allow long timeouts.
An alternative is configuring the parameters at the machine level, but doing so could affect other services running on the machine, or the machine may be inaccessible.
I appreciate general read and write timeouts might introduce complexity and agree with your point about query time. Instead, what about limiting timeouts to validity checks that only make trivial queries? The Java interface exposes that functionality through isValid(timeout)
.
We could definitely add a variant of is_valid that takes a timeout, but I'd want to be pretty careful about what timeouts are actually passed to it. For example, if get_timeout
is called with a 10 second timeout and we're 9.99995 seconds in when a connection becomes available, we probably wouldn't want to try calling is_valid with a 0.00005 second timeout, since that would almost assuredly fail.
Sounds good! I think is_valid(timeout)
would ultimately require access to the Tokio runtime, so I will start with a pull request in the rust-postgres
repository.
Now that Rust-Postgres exposes is_valid(timeout)
, I can think of two options that would maintain backward compatibility:
ManageConnection
implementations bound is_valid()
calls if they can or want. The bound would be independent of timeout
in get_timeout(timeout)
.is_valid_timeout
to the ManageConnection
trait that calls is_valid
by default. Then update get_timeout
accordingly.What do you think?
I think adding ManageTimeout::is_valid_timeout
with a default impl seems best.
First, thanks for providing the community with multiple database tools.
Sometimes
get_timeout(timeout)
exceedstimeout
. Consider an application that connects to a distributed database with multiple nodes, any of which can serve requests. Suppose the application callsget_timeout
, and the receiving database node dies. If the application configured tests on checkout,get_timeout
awaits the return of the underlyingis_valid
call before timing out.is_valid
likely executes a trivial SQL query, which encounters TCP errors and retries until the TCP retry limit of the operating system, often 15 minutes, expires. Meanwhile, the application could have obtained a connection to a different database node.One solution involves modification of the Rust-Postgres and Tokio-Postgres crates. It would expose a function, perhaps
is_valid_timeout(timeout)
, that executes a trivial SQL query or times out. Do you recommend a different approach? If not, would you be open to a pull request in therust-postgres
repository that introducesis_valid_timeout
?