Closed vividsnow closed 3 years ago
I actually can't replicate it with your sample script, but I can in my own application (which is what brings me here).
Well, I looked to see if any recent changes in DBD::Pg or libpq might be the cause, but I didn't notice anything. I was suspicious because I've been using Mojo::Pg for some time now and only recently noticed this happening.
Anyway, here's a patch I'm trying out in my own application image:
If the database socket breaks while listening for notifications, Mojo::Pg can
get into an infinite loop. This patch prevents that.
--- i/local/lib/perl5/Mojo/Pg/Database.pm
+++ w/local/lib/perl5/Mojo/Pg/Database.pm
@@ -139,10 +139,10 @@ sub _notifications {
my $dbh = $self->dbh;
my $n;
- return undef unless $n = $dbh->pg_notifies;
+ return undef unless $n = eval { $dbh->pg_notifies };
while ($n) {
$self->emit(notification => @$n);
- $n = $dbh->pg_notifies;
+ $n = eval { $dbh->pg_notifies };
}
return 1;
An alternative fix could be to ping
before using pg_notifies
as the example in the DBD::Pg docs does, but it seems like that would be a lot of pings. And presumably the socket could break in between the ping and pg_notifies...
Having more ping
calls would be very bad for performance, since they are select 1;
roundtrips every time.
Seeing almost the same error message here, except this is on the poll-reactor:
Mojo::Reactor::Poll: I/O watcher failed: DBD::Pg::db pg_notifies failed: connection not open at local/lib/perl5/Mojo/Pg/Database.pm line 140.
Mojo::Reactor::Poll: I/O watcher failed: DBD::Pg::db pg_notifies failed: connection not open at local/lib/perl5/Mojo/Pg/Database.pm line 140.
The sample script we concocted was this:
#!/usr/bin/env perl
use Mojolicious::Lite;
use Mojo::Pg;
my $pg = Mojo::Pg->new('postgresql:///foo');
Mojo::IOLoop->recurring(5 => sub { $pg->pubsub->notify(foo => 'bar') });
$pg->pubsub->listen(foo => sub { app->log->info(pop) });
app->start;
A bit verbose yes, but it is somewhat of a minimal example of the code we saw this behaviour in.
If the database is shut down (we had postgres running in docker) the above error message was absolutely spamming output. CPU usage immediately (on one core naturally) went to 100% and log output produced is about 936MB / minute. So, this kind of an issue could have a serious operational impact.
I am not sure myself how this is best avoided and nothing jumps out at me when looking at the source code for Mojo::Pg::PubSub
. I just wanted to give an example of the things we are observing.
I have tried replicating the problem, but for me the described scenario works just fine. So i cannot investigate and someone else will have to work on a fix.
@kraih That is very interesting. Here it is reproducible 100% of the time. May I ask which platform and PostgreSQL version you have tested this on?
I ran it on macOS Catalina with PostgreSQL 13. Poll and EV reactors.
@kraih How do you kill a connection? I was able to reproduce the issue by killing connection using this tool: https://github.com/google/tcp_killer on MacOS 10.15.7, homebrewed perl 5.32 (both EV and Poll reactors) and PostgreSQL 13.
@vividsnow I restarted the server.
In this (normal) case I suspect that postgresql server sends some termination/error message to its clients before socket close. But in rare cases of postgresql/pgbouncer/network failure the original issue occurs.
Steps to reproduce the behavior
sample script
run and then kill socket via
Expected behavior
exit on disconnect
Actual behavior
infinite loop with errors: