p5-RedisDB / RedisDB

Perl extension to access Redis
22 stars 11 forks source link

No Callback errors #31

Closed michael-stevens closed 6 years ago

michael-stevens commented 6 years ago

Hi,

We're getting a series of

Error in Perl code: No callbacks in the queue and no default callback set at /usr/local/share/perl5/RedisDB.pm

errors.

We've also seen:

Error in Perl code: You can't use RedisDB::execute when you have replies to fetch..

This was a previously working application and it's not clear what's triggering the problem, although it's possibly related to an upgrade to RHEL 7.4.

perl is:

mstevens@864862-vm-dev:~ % perl -V
Summary of my perl5 (revision 5 version 16 subversion 3) configuration:

  Platform:
    osname=linux, osvers=2.6.32-642.13.1.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux x86-017.build.eng.bos.redhat.com 2.6.32-642.13.1.el6.x86_64 #1 smp wed nov 23 16:03:01 est 2016 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -Wl,-z,relro  -DDEBUGGING=-g -Dversion=5.16.3 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.8.5 20150623 (Red Hat 4.8.5-11)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro '

Characteristics of this binary (from libperl): 
  Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
                        PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT
                        PERL_MALLOC_WRAP PERL_PRESERVE_IVUV USE_64_BIT_ALL
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
                        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
                        USE_REENTRANT_API USE_SITECUSTOMIZE
  Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch9: Fix find2perl to translate ? glob properly (RT#113054)
    Fedora Patch10: Fix broken atof (RT#109318)
    Fedora Patch13: Clear $@ before "do" I/O error (RT#113730)
    Fedora Patch14: Do not truncate syscall() return value to 32 bits (RT#113980)
    Fedora Patch15: Override the Pod::Simple::parse_file (CPANRT#77530)
    Fedora Patch16: Do not leak with attribute on my variable (RT#114764)
    Fedora Patch17: Allow operator after numeric keyword argument (RT#105924)
    Fedora Patch18: Extend stack in File::Glob::glob, (RT#114984)
    Fedora Patch19: Do not crash when vivifying $|
    Fedora Patch20: Fix misparsing of maketext strings (CVE-2012-6329)
    Fedora Patch21: Add NAME headings to CPAN modules (CPANRT#73396)
    Fedora Patch22: Fix leaking tied hashes (RT#107000) [1]
    Fedora Patch23: Fix leaking tied hashes (RT#107000) [2]
    Fedora Patch24: Fix leaking tied hashes (RT#107000) [3]
    Fedora Patch25: Fix dead lock in PerlIO after fork from thread (RT#106212)
    Fedora Patch26: Make regexp safe in a signal handler (RT#114878)
    Fedora Patch27: Update h2ph(1) documentation (RT#117647)
    Fedora Patch28: Update pod2html(1) documentation (RT#117623)
    Fedora Patch29: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015)
    RHEL Patch30: Use stronger algorithm needed for FIPS in t/op/crypt.t (RT#121591)
    RHEL Patch31: Make *DBM_File desctructors thread-safe (RT#61912)
    RHEL Patch32: Use stronger algorithm needed for FIPS in t/op/taint.t (RT#123338)
    RHEL Patch33: Remove CPU-speed-sensitive test in Benchmark test
    RHEL Patch34: Make File::Glob work with threads again
    RHEL Patch35: Fix CRLF conversion in ASCII FTP upload (CPAN RT#41642)
    RHEL Patch36: Do not leak the temp utf8 copy of namepv (CPAN RT#123786)
    RHEL Patch37: Fix duplicating PerlIO::encoding when spawning threads (RT#31923)
  Built under linux
  Compiled at Mar  2 2017 08:50:35
  @INC:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .

and redis is:

[mstevens@864795-vm1 ~]$ redis-server --version
Redis server v=3.2.11 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=e0949002f8ebb101

RedisDB is 2.53.

trinitum commented 6 years ago

"No callbacks in the queue" means that you've got unexpected reply from the server. "You can't use RedisDB::execute when you have replies to fetch" means that you are trying to run synchronous command before fetching reply to the previously submitted asynchronous one. I can't tell exactly what's wrong without seeing the code. Are you using threads?

michael-stevens commented 6 years ago

Hi,

We aren't using threads, I can replicate the problem with a single threaded command line script (although I haven't quite managed to minimise it to the point it only uses RedisDB yet).

The problem is most easily triggered by restarting the redis instance that RedisDB is connected to, although we've also seen it occur under load.

I'm still trying to confirm this but it looks like the problem occurs if the first recv after connecting and setting the timeouts fails with ECONNRESET. Wondering if RedisDB doesn't reset its state properly if the connection fails during setup? Speculation though.

trinitum commented 6 years ago

script would be great

michael-stevens commented 6 years ago

Hi,

Okay this is still a bit hacky, but this script:

#!/usr/bin/perl -w

use strict;
use RedisDB;

while (1) {
    my $redis = RedisDB->new( host => '192.168.204.29', port => '6379', raise_error => 0, timeout => 0.000001);
    sleep 1;
    print $redis->exists('scripts:query_cache');
    print "\n";
}

Plus modifyingRedisDB.pm to simulate a network problem, by changing _recv_data_nb:

        my $ret = recv( $self->{_socket}, my $buf, 131072, $DONTWAIT );
        $ret = undef;
        $! = ECONNRESET;

seem to consistently cause a problem for me.

(a problem beyond the fact you'd obviously expect the call to fail with a network error)

trinitum commented 6 years ago

Silly me trying to support too many ways to use this module... I believe the following patch should fix the problem, I'll release the new version later tonight:

diff --git a/lib/RedisDB.pm b/lib/RedisDB.pm
index 00a9a57..b2958d8 100644
--- a/lib/RedisDB.pm
+++ b/lib/RedisDB.pm
@@ -445,9 +445,11 @@ sub _recv_data_nb {
             last if $! == EAGAIN or $! == EWOULDBLOCK;
             next if $! == EINTR;

-            # on any other error close connection
-            $self->_on_disconnect( 1,
-                RedisDB::Error::DISCONNECTED->new("Error reading from server: $!") );
+            # on any other error close the connection
+            my $error =
+              RedisDB::Error::DISCONNECTED->new("Error reading from server: $!");
+            $self->_on_disconnect( 1, $error );
+            return $error;
         }
         elsif ( $buf ne '' ) {
trinitum commented 6 years ago

I was getting a different error though: Can't use an undefined value as a symbol reference at lib/RedisDB.pm line 441., did it gave to you "No callbacks"?

michael-stevens commented 6 years ago

I was getting the symbol reference error with that test case. but I'm hoping it's the same underlying bug.

While debugging I was getting a bit worried that you don't reset the connection state in on_disconnect if raise_error is not set:

    if ($err) {
        $error_obj ||= RedisDB::Error::DISCONNECTED->new(
            "Server unexpectedly closed connection. Some data might have been lost.");
        if ( $self->{raise_error} or $self->{_in_multi} or $self->{_watching} ) {
            $self->reset_connection;
            die $error_obj;
        }

But I don't understand the code well enough to be sure if it's a problem.

We'll test your patch, many thanks.

trinitum commented 6 years ago

reset_connection is not a very good name, it resets the whole object to the initial state, but with raise_error disabled it is not the solution, so it calls propagate_error instead to deliver the error to all the callbacks and reply queue, and removes the socket (which effectively resets network connection).

michael-stevens commented 6 years ago

Testing the patch, it's looking good so far.

trinitum commented 6 years ago

v2.54 is on CPAN. Thank you for reporting the problem and investigating.