jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.81k stars 842 forks source link

Timeout if connection is lost (OpenVPN) #1618

Open JamesInform opened 1 year ago

JamesInform commented 1 year ago

I am connected to a PostgreSQL server via OpenVPN. I have an open connection that sends a select statement every 10 seconds in a loop.

If the OpenVPN connection closes due to some network issue, the next select statement takes 3min+ until a timeout appears.

Question: Is there any elegant way of handling such a scenario and get an appropriate quick timeout response? I have tested pgconn.checkConn(), but checkConn() tells, that everything is fine altough it isn't because OpenVPN connection has been gone.

jackc commented 1 year ago

If *PgConn.CheckConn is not detecting the dropped connection then I would guess that the connection isn't lost as far as Go or the OS is concerned. My guess is OpenVPN is keeping the local side of the connection open while it tries to reconnect transparently (or something like that).

I don't know of any way to detect a situation like that reliably, but what you could do is execute your queries with a context with a short timeout. That would at least keep it from blocking (and setting a context timeout is good practice on any operation that can hang anyway).

JamesInform commented 1 year ago

Indeed OpenVPN seems to do nasty tricks on the connection. Even if I kill my go application during the OpenVPN connection is not available, the connection keeps established on the PostgreSQL server side. Very strange.

A context with a timeout is a good idea, but unfortunately I cannot use it, because the executed sql statement 's duration can vary from some seconds to some minutes. So a timeout context could also break a correctly running statement.

Nevertheless thanks for your quick response and great support..

ns-gzhang commented 1 year ago

@jackc Hi Jack, my scenario is related to pgconn.CheckConn() in our environments, not OpenVPN, but PG container on my Mac, or on K8s stacks. The symptom is that pgconn.Exec() returns error conn closed, but pgconn.CheckConn() says no error, it repeated over 2 dozens of times, and finally pgconn.CheckConn() detects it. Seems if I use pgconn.IsClosed() after pgconn.Exec(), it detects the status right away. Is it expected? Or I didn't use it the right way?

jackc commented 1 year ago

@ns-gzhang

Hmm... pgconn.CheckConn checks the underlying net.Conn. conn closed is at the pgconn.Conn layer. My guess what is happening is that an error is encountered that causes pgconn to close the connection to the server. The pgconn layer is immediately marked as closed, but it tries to do a graceful shutdown of the underlying connection in the background. Since pgconn.CheckConn directly used the underlying connection it would live a bit longer.

I've adjusted this behavior in b3739c1289f345f325f163dfea603d8d9d963a3f.

ns-gzhang commented 1 year ago

Thanks Jack for the explanation. That makes sense. Thanks for the change!