mojolicious / mojo-pg

Mojolicious :heart: PostgreSQL
https://metacpan.org/release/Mojo-Pg
Artistic License 2.0
101 stars 46 forks source link

infinite loop on connection drop #68

Closed vividsnow closed 3 years ago

vividsnow commented 4 years ago

Steps to reproduce the behavior

sample script

use strict; use warnings;
use Mojo::IOLoop;
use Mojo::Pg;

my $pg = Mojo::Pg->new('postgresql://mojo:mojo@localhost/mojo');
my $ps = $pg->pubsub;

$ps->listen(foo => sub { printf "payload: %s\n", $_[1] });
$ps->on(disconnect => sub { Mojo::IOLoop->reset });

Mojo::IOLoop->start unless Mojo::IOLoop->is_running;

run and then kill socket via

# ss -K dport = 5432

Expected behavior

exit on disconnect

Actual behavior

infinite loop with errors:

Mojo::Reactor::EV: I/O watcher failed: DBD::Pg::db pg_notifies failed: connection not open at ..Mojo/Pg/Database.pm line 142.
chazmcgarvey commented 4 years ago

I actually can't replicate it with your sample script, but I can in my own application (which is what brings me here).

chazmcgarvey commented 4 years ago

Well, I looked to see if any recent changes in DBD::Pg or libpq might be the cause, but I didn't notice anything. I was suspicious because I've been using Mojo::Pg for some time now and only recently noticed this happening.

Anyway, here's a patch I'm trying out in my own application image:

If the database socket breaks while listening for notifications, Mojo::Pg can
get into an infinite loop. This patch prevents that.
--- i/local/lib/perl5/Mojo/Pg/Database.pm
+++ w/local/lib/perl5/Mojo/Pg/Database.pm
@@ -139,10 +139,10 @@ sub _notifications {

   my $dbh = $self->dbh;
   my $n;
-  return undef unless $n = $dbh->pg_notifies;
+  return undef unless $n = eval { $dbh->pg_notifies };
   while ($n) {
     $self->emit(notification => @$n);
-    $n = $dbh->pg_notifies;
+    $n = eval { $dbh->pg_notifies };
   }

   return 1;

An alternative fix could be to ping before using pg_notifies as the example in the DBD::Pg docs does, but it seems like that would be a lot of pings. And presumably the socket could break in between the ping and pg_notifies...

kraih commented 4 years ago

Having more ping calls would be very bad for performance, since they are select 1; roundtrips every time.

christopherraa commented 3 years ago

Seeing almost the same error message here, except this is on the poll-reactor:

Mojo::Reactor::Poll: I/O watcher failed: DBD::Pg::db pg_notifies failed: connection not open at local/lib/perl5/Mojo/Pg/Database.pm line 140.
Mojo::Reactor::Poll: I/O watcher failed: DBD::Pg::db pg_notifies failed: connection not open at local/lib/perl5/Mojo/Pg/Database.pm line 140.

The sample script we concocted was this:

#!/usr/bin/env perl

use Mojolicious::Lite;
use Mojo::Pg;

my $pg = Mojo::Pg->new('postgresql:///foo');
Mojo::IOLoop->recurring(5 => sub { $pg->pubsub->notify(foo => 'bar') });
$pg->pubsub->listen(foo => sub { app->log->info(pop) });

app->start;

A bit verbose yes, but it is somewhat of a minimal example of the code we saw this behaviour in.

If the database is shut down (we had postgres running in docker) the above error message was absolutely spamming output. CPU usage immediately (on one core naturally) went to 100% and log output produced is about 936MB / minute. So, this kind of an issue could have a serious operational impact.

I am not sure myself how this is best avoided and nothing jumps out at me when looking at the source code for Mojo::Pg::PubSub. I just wanted to give an example of the things we are observing.

kraih commented 3 years ago

I have tried replicating the problem, but for me the described scenario works just fine. So i cannot investigate and someone else will have to work on a fix.

christopherraa commented 3 years ago

@kraih That is very interesting. Here it is reproducible 100% of the time. May I ask which platform and PostgreSQL version you have tested this on?

kraih commented 3 years ago

I ran it on macOS Catalina with PostgreSQL 13. Poll and EV reactors.

vividsnow commented 3 years ago

@kraih How do you kill a connection? I was able to reproduce the issue by killing connection using this tool: https://github.com/google/tcp_killer on MacOS 10.15.7, homebrewed perl 5.32 (both EV and Poll reactors) and PostgreSQL 13.

kraih commented 3 years ago

@vividsnow I restarted the server.

vividsnow commented 3 years ago

In this (normal) case I suspect that postgresql server sends some termination/error message to its clients before socket close. But in rare cases of postgresql/pgbouncer/network failure the original issue occurs.