lpsmith / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
206 stars 71 forks source link

libpq: failed (no connection to the server) #200

Closed creichert closed 7 years ago

creichert commented 7 years ago

I have two projects which use AWS RDS. One uses postgresql-simple via persistent, another uses postgresql-simple plain.

A few weeks ago I started receiving this error frequently from my code on both projects:

libpq: failed (no connection to the server)

It also comes in another variant:

libpq: failed (ERROR: client sent partial pkt in startup phase )

The postgresql server logs do seem to spit out a corresponding error:

could not receive data from client: Connection reset by peer

I'm at a loss on where to begin here. It's happening between both projects

I should mention that, I just recently upgraded one of the projects stackage snapshot from lts-3.9 to nightly-2016-08-25. That would mean I upgraded from postgres-simple-0.4.10.0 to postgresql-simple-0.5.2.1.

I didn't change any database code before the upgrade and after the upgrade I started receiving these errors.

I'm not necessarily "blaming" postgresql-simple but I am looking for suggestions and ideas on how to identify and fix this problem.

creichert commented 7 years ago

Update: I'm trying some different versions of postgresql-simple to see if I can find a version which works for me. Would appreciate any tips!

In my stack.yaml:

  - location:
      git: git@github.com:lpsmith/postgresql-simple
      #commit: a8f6a901a38df0f65906889bc7f20bf59ecdd73b
      commit: 22074e507cf45f24111a3067a706ec84905023c6
lpsmith commented 7 years ago

You might just try the release versions. I don't have any particular idea why this might be happening; so let me know what you find out.

You might also consider obtaining a packet capture; wireshark does have a postgres protocol dissector, which has been very useful in the past.

Also, I'm mildly curious if this problem continues to exhibit itself if you downgrade to postgresql-simple-0.4.10.0, while keeping all your other upgrades. The reason being, I'm at a bit of a loss to explain this via any changes in postgresql-simple or postgresql-libpq, but if I'm interpreting the error message correctly, it would seem as though the client side is closing the file descriptor for some reason, possibly bypassing libpq entirely. It's entirely possible this is due to a use-after-close bug in somebody else's code, and if the old version works, this would suggest that this possibility is worth delving into further.

In any case, please do keep me in the loop when you find something out.

creichert commented 7 years ago

@lpsmith I'm still not clear on all the details but this was in fact in my own code. I'll try to post more details here as I learn more about the root cause.

I tried downgrading to older versions, even 0.4.10.0, but I still continually received the error. I believe the stackage snapshot upgrade was simply a coincidence.

My assumption right now is that something subtle changed on the AWS side which just now made this bug appear in the form of runtime errors. Most likely the bug has always been present but was more resilient to blatant failures.