jackc / pgconn

MIT License
182 stars 87 forks source link

Fix TLS connection timeout #100

Closed blakeembrey closed 2 years ago

blakeembrey commented 2 years ago

Follow up to https://github.com/jackc/pgconn/pull/93 which fixes the test case. Re-uses the same code with minor changes to fix the bugs:

blakeembrey commented 2 years ago

@jackc based on this comment I don't see a race condition because pgConn.conn would refer to whichever connection is in use on cancel (net or tls). But if we wanted to be specific I could create a new context watcher for the tls client and unwatch the previous one.

jackc commented 2 years ago

There is still a window between

pgConn.conn = tlsConn

and the execution of the

defer pgConn.contextWatcher.Unwatch().

If the context was canceled in that time then

pgConn.conn.SetDeadline(time.Date(1, 1, 1, 1, 1, 1, 1, time.UTC))

would have been called on the TCP conn and

pgConn.conn.SetDeadline(time.Time{})

would have been called on the TSL conn.

I'm not sure if that would actually be a problem or not because I think that path would only happen on a timeout and then the connection is dead anyway.

But if we wanted to be specific I could create a new context watcher for the tls client and unwatch the previous one.

But, yes, I think we should do this to clarify potential confusion -- even if it technically will work now.

blakeembrey commented 2 years ago

Ah, I think I understand now. I didn't really understand the purpose of the .setDeadline(time.Time{}) I think but got it now. The first runs on cancel, the second would run at the end of connect if it cancelled.

blakeembrey commented 2 years ago

Based on the source code it looks like it doesn't actually matter (it just calls the underlying connection method) but I've gone ahead and made the change to call a different context watcher for net and tls connections.

jackc commented 2 years ago

I think that one of those watchers must be getting left over or in a bad state. I've run the CI tests several times and I consistently get an error: https://github.com/jackc/pgconn/runs/4643444012?check_suite_focus=true

blakeembrey commented 2 years ago

Thanks. I haven't been able to replicate locally but it's probably due to how I was relying on defer, I was thinking it'd use the reference to pgConn.contextWatcher when it finally runs, but I think it actually keeps the reference from when it initially encountered the defer line.