ClientError "no connection to the server"

fghibellini commented 5 years ago

I am getting errors of the form:

SessionError (QueryError "..." [ ... ] (ClientError (Just "no connection to the server")))

This seems to be caused by the connection being dropped (see https://github.com/lpsmith/postgresql-simple/issues/200 and https://github.com/psequel/psequel/issues/180).

Any best practice on how to handle this?

It looks like once I get one of these, the connection is fed to other calls of use and all the subsequent reuses fail the same way. Do I have to signal to Hasql-Pool that the connection should be evicted from the pool?

Version: hasql-pool-0.5.0.1

nikita-volkov commented 5 years ago

Yes. This seems to be a shortcoming of the current implementation of hasql-pool. I can suggest no better than to directly control a pool of connections using the underlying "resource-pool" package.

fghibellini commented 5 years ago

Ok, thanks for the quick answer

nikita-volkov commented 5 years ago

I'll keep the issue open as a reminder that it needs fixing

fghibellini commented 5 years ago

I looked at the code and it seems like the logic to destroy connections that are not working anymore is already there.

I think the intent was to drop the connection when the DB operation fails which is signaled in Hasql by an Either but in this case the top-level Either is not the one returned by Session.run but the result of traversing the resource which is of type Either Hasql.Connection.ConnectionError Hasql.Connection.Connection.

I created a branch that makes the distinction more explicit https://github.com/fghibellini/hasql-pool/commit/410ff951d9070c0862369feb7528ef6a55e4e170

fghibellini commented 5 years ago

What is the advantage of storing initialization failures in the pool ?

nikita-volkov commented 5 years ago

What is the advantage of storing initialization failures in the pool ?

"resource-pool" uses exceptions for logic, I'm avoiding them that way.

fghibellini commented 4 years ago

Hi,

I tried to come up with a possible solution, which I am submitting for your consideration. The commits from October have been tested in production and have fixed the above described problem. The January commit simply replaces the value-encoded connection error with exceptions. I think it simplifies the package as I found it confusing that the pool holds improperly initialized connections.

changes to hasql-pool: https://github.com/nikita-volkov/hasql-pool/compare/master...fghibellini:error-source

changes to hasql: https://github.com/nikita-volkov/hasql/compare/master...fghibellini:connection-error

torgeirsh commented 4 years ago

What is the status on this issue? Should it be made into a PR for easier consideration?

nikita-volkov commented 4 years ago

I've rewritten the lib to lose the dependency on the "resource-pool" package. I haven't released it yet since I haven't tested it out thoroughly enough. It's on the "no-resource-pool" branch. You can depend on it using Stack.

Please do check it out and provide feedback.

Also it would be really appreciated if someone extended the test suite to cover all edge cases.

fghibellini commented 4 years ago

Hi @nikita-volkov

I took a look at your branch today.

If I understand correctly your new code just replaces the dependency resource-pool with custom code that avoids using exceptions.

I have to start by mentioning that I really liked the original implementation because it was a really good example of the code reusability that Haskell provides. Plus given that this tends to be quite critical and error-prone code, as a user I very much like the fact that most of the code (the logic) has a larger user-base than hasql and thus it has been probably tested under a wider range of scenarios.

I thought it would make sense to at least refactor the code to follow the original division of hasql vs pooling logic (maybe even put the pooling in a separate package). This would also allow for a side-by-side comparison with the resource-pool source code (it would basically only differ in the returned types).

You can see my initial attempt at https://github.com/fghibellini/hasql-pool/tree/no-resource-pool - the code has a Pool module where all the pooling logic is extracted (and I split use into take, put, destroy and withResource). The module Hasql.Pool is thus reduced to the minimum - API is not altered. The code compiles fine.

My findings so far:

https://github.com/nikita-volkov/hasql-pool/blob/fca5c87112f714fe86ed84c4f8887afe46de23ba/library/Hasql/Pool.hs#L44 is not used anywhere, so there's no cleanup process. I thus commented it out and didn't spend any time on it so far.
https://github.com/nikita-volkov/hasql-pool/blob/fca5c87112f714fe86ed84c4f8887afe46de23ba/library/Hasql/Pool.hs#L139 is probably supposed to be pred instead of succ
After comparing with resource-pool I noticed that https://github.com/nikita-volkov/hasql-pool/blob/fca5c87112f714fe86ed84c4f8887afe46de23ba/library/Hasql/Pool.hs#L23 the size of the Pool doesn't seem to be respected anywhere
~also this refactoring does not seem to address the problem raised in this issue (it doesn't really seem to be related in any way)~ the new logic throws away a connection on any error. this is probably undesirable - a DB insert collision is not a good reason to throw away the connection

I stopped comparing with resource-pool after I noticed that the pool size is not respected, so there might be other issues.

I'm sorry if I misunderstood something - it's pretty late in the night here :)

nikita-volkov commented 4 years ago

@fghibellini Thanks for your feedback! You're correct in your many findings. I've revised the code addressing the following issues:

Integrate the cleanup loop.
Full review of the queueSizeVar, switching from it to a variable which instead keeps track of the slots available for connection. Hence this also addresses the issue of pool size not being respected.
Connection no longer gets dropped when the error is not related to connections.

Profpatsch commented 3 years ago

We have run into this as well, the current version of hasql-pool doesn’t allow to destroy the resource on connection error, it is placed back into the pool unconditionally.

Is there a simple patch we can apply to fix this problem? I’d rather just drop any connections raising a libpq ClientError unconditionally, than to get stuck with an unusable connection.

Profpatsch commented 3 years ago

I want to voice my concern with rewriting the functionality of resource-pool in here. I looked at the code and it does not look like code that is trivially easy to get right.

I’d rather fix the library as it is, by dropping ClientErrors by default.

@fghibellini I looked at your patch and I’m a bit disturbed by

    case s of
      LibPQ.ConnectionOk -> return Nothing
      _ -> fmap (Just . ConnectionError) (LibPQ.errorMessage c)

and then later

defaultConnectionHealthCheck (QueryError _ _ (ClientError (Just "no connection to the server"))) = False

in hasql-pool.

This assumes that

a) LibPq.errorMessage would actually return the connection error, and not some random other error that happened in the meantime (while ignoring the actual error with a _) b) that libpq would return exactly that string, over all following library versions.

Instead, what should be done here is use the error codes in https://www.postgresql.org/docs/current/errcodes-appendix.html to find the appropriate error codes returned by a libpq action, and then handle those.

Profpatsch commented 3 years ago

Okay, so looking through it further, it looks to me like ClientError is only produced when a function that sends a message over the Connection returns False or Nothing, which indicates that the libpq C function returned a NULL pointer. Since there is no result, the libpq documentation suggests calling PQErrorMessage (horryfyingly not thread safe…) to get the error message for the last operation, which should (as I said, thread safety…) be the one that just failed.

However, it also says that there is a “trailing newline in the message”, so I’m pretty sure your defaultConnectionHealthCheck above wouldn’t actually match on the right message (unless something does chomping).

I’d suggest the default pool action could be something like “drop every connection that returns a client error and log to stderr” or similar. But connectionHealthCheck :: QueryError -> Bool should definitely return IO ConnectionAction or something, where

data ConnectionAction
  = DropConnection
  | KeepConnection

and should also take the connection itself, so that the provided callbck can try to reset it for example. So

onQueryError :: Connection -> QueryError -> IO ConnectionAction

with a defaultOnQueryError that drops the connection whenever a ClientError happens.

Profpatsch commented 3 years ago

Oh @fghibellini I also see a problem with your patch to hasql, specifically introducing the ConnectionError only in the checkConnectionStatus function.

This function isn’t really called much in practice, since ClientErrors can happen anytime something tries to use the connection, not just when it starts the connection. So there is not much value in adding this one separate error case in one place.

Profpatsch commented 3 years ago

Please see the attached PR, it should fix this problem.

torgeirsh commented 2 years ago

We've observed this problem as well, and both the no-resource-pool branch and #12 appear to fix it. @nikita-volkov do you plan to merge the branch soon, or is there more work to be done? @Profpatsch did you release a separate package?

nikita-volkov commented 2 years ago

Fixed and released.

nikita-volkov / hasql-pool

ClientError "no connection to the server" #6