lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
9.09k stars 911 forks source link

connect_timeout doesn't account for getsockopt error #459

Closed bkabrda closed 8 years ago

bkabrda commented 8 years ago

Hi, I'm trying to use pgweb, which uses pq as postgresql connection backend. I'm doing this in a multicontainer kubernetes setup. The problem is that kubernetes doesn't allow you to specify startup order of containers, so sometimes pgweb can start before postgres container is up. This means that kubernetes already knows IP of postgres container (and I provide that IP to pgweb), but that IP isn't ready to accept connections yet. In these situations, I get this error:

Pgweb v0.9.2 (git: 34056c1c6636ea001ebbf3d185baac7ba03e0d82)
Connecting to server...
Error: dial tcp 172.17.0.5:5432: getsockopt: connection refused

This is even though I'm using connect_timeout=0. I believe that the above error should be caught if connect_timeout is set and connecting should be retried.

johto commented 8 years ago

This is even though I'm using connect_timeout=0. I believe that the above error should be caught if connect_timeout is set and connecting should be retried.

Surely you'd want feedback on why the connection attempt failed, rather than having the goroutine (and commonly the entire program) hang indefinitely trying to connect? If the application gives up after one connection attempt, that's its problem and better solved in the application.

Or perhaps in this case your problem comes from the fact that in your setup you just start pgweb instead of running it. And that problem has many more symptoms than startup problems due to undefined container start-up ordering.

bkabrda commented 8 years ago

So if I read the first paragraph of what you wrote correctly, you're basically saying that connect_timeout=0 should throw an error if the IP doesn't exist and shouldn't retry? (IIUC connect_timeout=0 should mean "try until you succeed with no limit). This is kind of confusing for me, since I'd expect the library to try until it succeeds or the limit is reached and only after that it should return the error (whatever the error is). Where am I wrong with? (BTW, in the meanwhile I managed to solve this at kubernetes layer, where I actually told kubernetes to restart the container until the command succeeds, so this is no longer an issue for me)

johto commented 8 years ago

So if I read the first paragraph of what you wrote correctly, you're basically saying that connect_timeout=0 should throw an error if the IP doesn't exist and shouldn't retry? (IIUC connect_timeout=0 should mean "try until you succeed with no limit). This is kind of confusing for me, since I'd expect the library to try until it succeeds or the limit is reached and only after that it should return the error (whatever the error is). Where am I wrong with?

connect_timeout = 0 is just the default, meaning "don't apply an application level timeout on connection attempts". It doesn't influence how many times a connection is attempted before giving up. Now our docs on this matter could perhaps be a bit better (as could libpq's, where they appear to have been copied from), but this is really common terminology in software. net.DialTimeout works the same way, for example.

bkabrda commented 8 years ago

Ah, ok, then I failed to understand connect_timeout. This makes sense. Thanks for explaining and feel free to close this issue.