typelevel / skunk

A data access library for Scala + Postgres.
https://typelevel.org/skunk/
MIT License
1.58k stars 160 forks source link

Automatic reconnection #81

Open gvolpe opened 4 years ago

gvolpe commented 4 years ago

Right now, if the connection to the server is lost (i.e. restart PostgreSQL server), the application needs to be restarted to get reconnected. I think this would be a nice feature to have :)

Here's the exception I get when I stop and then start the PostgreSQL server:

root[ERROR] java.lang.Exception: Fatal: EOF before 5 bytes could be read.Bytes
root[ERROR]     at skunk.net.BitVectorSocket$$anon$2.$anonfun$readBytes$1(BitVectorSocket.scala:72)
root[ERROR]     at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:139)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR]     at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR]     at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR]     at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR]     at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
root[ERROR]     at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:51)
root[ERROR]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
root[ERROR]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
root[ERROR]     at java.lang.Thread.run(Thread.java:748)
tpolecat commented 4 years ago

I'm not too worried about that with single-use connections but I think I will probably remove that ability and only allow pooled connections. They will have a health check when they're returned to the pool but it might make sense to do a liveness check again before they're handed out to handle this case.

gvolpe commented 4 years ago

FWIW this happened using a pool of connections, but maybe that's not fully implemented yet? I recall seeing some TODO comments around this code.

tpolecat commented 4 years ago

maybe that's not fully implemented yet

Yeah that pooling code is bad. I have a WIP for a new implementation but it doesn't have the pre-checkout phase yet.

Also to a first approximation nothing is fully implemented! I mean, you discovered that INSERT wasn't supported yet because nobody had ever tried it ;-)

gvolpe commented 4 years ago

No worries :smile:

I'll leave this issue open hoping it gets fixed someday, no rush :) Once your new pooling code is out, I can give it a try and maybe come back with a PR if it's not yet fixed.

lorandszakacs commented 3 years ago

I've also encountered this issue.

An easy to use reproduction can be found in this repo, including the necessary scripts to start/restart the PSQL in docker. N.B. it only reproduces w/ pooled connections.

Now, I've hacked together quite a nice way to get pool reinitialization in userland, without full app restart (ping @gvolpe if you need it). But it's not a fully tested in terms of all the concurrency problems that can arise, but it works in production :joy:. A bunch of Ref, Semaphore and Resource.allocated :grimacing:. And not generic enough to handle restart when you wanted a stream in your query. But in case anyone wants it, it can be found in the same repo

@tpolecat, I'll gladly take a look at solving this problem in skunk itself if this is wanted. As the solution here is much simpler than the userland pool reinit hack.

tpolecat commented 3 years ago

This should be fixed. Have you tried this with https://github.com/tpolecat/skunk/pull/388 ?

gvolpe commented 3 years ago

Nice, @tpolecat please feel free to close this, I completely forgot about this issue! :smile:

lorandszakacs commented 3 years ago

@tpolecat if that fix wound up in 0.0.24 then I did try it. And it's not fully fixed, not for the particular usecase in the repo I had linked.