Closed istathar closed 2 years ago
The telemetry shown here has us making the database change at about 10:05; shortly after it came back we started seeing the purple "no connection to server" errors.
The problem went away when we restarted the Haskell program. Weird, right?
So we're wondering if there was some condition we need to detect and manually remove the connections from the pool if we see it?
Which version of pool are you using? Have you tried the latest version?
Looks like we're on 0.5.2.2; we'll try upgrading to 0.7.2.1.
Has this been resolved?
I was just digging into this. We were also on 0.5.2.2.
I found that Hasql gives out "no connection to the server" errors as ClientError
which is a type of CommandError
and gets emitted as a QueryError
. hasql-pool
does not destroy the resource when it sees QueryError
s (because you don't want to drop the connection on every failed query) and so it never drops connections when they can't connect.
I'll also see if we can update and whether the current logic is better about this.
Has this been resolved?
@nikita-volkov Not sure; we've upgraded to 0.7.2.1 and our system continues to preform great so thank you! but we haven't done a database infrastructure change event yet so can't confirm that the bug is cleared.
@periodic's nice analysis! We didn't get that deep when looking at this.
I'd say go ahead and close this if you're comfortable you understand why hasql-pool was encountering it. Thanks so much Nikita.
I was just digging into this. We were also on 0.5.2.2.
I found that Hasql gives out "no connection to the server" errors as
ClientError
which is a type ofCommandError
and gets emitted as aQueryError
.hasql-pool
does not destroy the resource when it seesQueryError
s (because you don't want to drop the connection on every failed query) and so it never drops connections when they can't connect.I'll also see if we can update and whether the current logic is better about this.
This is exactly the issue that the latest releases resolve. See #6.
Thanks guys. Closing this as resolved. Feel free to reopen in case your issues remain.
Don't have a lot of evidence for this, but we're encountered a very strange outage today when we resized an Amazon RDS instance and wanted to let you know.
The Haskell service talking to that database suddenly started chucking 100% errors due to failed queries and transactions.
The problem went away when we restarted the Haskell program.
Our legacy system (which uses a different Postgres library) survived changes to the RDS instances no problem. We speculate that it somehow either noticed the failed connections and reconnected (or worse was being reconnecting all the time - stupid, but works around this issue potentially).
In the new system we're using hasql via hasql-pool.
Is this of interest? If so we can certainly try to provide more details.