lestrrat-p5 / ZMQ

libzmq Perl binding
46 stars 31 forks source link

socket object be examined more safety in zmq_poll() #63

Closed potatogim closed 9 years ago

potatogim commented 9 years ago

Hi, It's been a long time :-)

When I use zmq_poll(), It falls to segmentation fault.

Interestingly, that situation not always.

so I have traced this code with GDB, I met this situation like that.

Program received signal SIGSEGV, Segmentation fault.
0x00007fffee1fec62 in XS_ZMQ__LibZMQ3_zmq_poll (my_perl=<value optimized out>, cv=<value optimized out>) at xs/perl_libzmq3.xs:1070
1070                    pollitems[i].socket = ((P5ZMQ3_Socket *) mg->mg_ptr)->socket;

I have put this code into the xs/perl_libzmq3.xs

1070                 printf("ADDR: %#X\n", mg->mg_ptr);
1071                 pollitems[i].socket = ((P5ZMQ3_Socket *) mg->mg_ptr)->socket;

and then I have seen this results.

Normal case

[INFO] Initializing...
[New Thread 0x7fffed024700 (LWP 28015)]
[New Thread 0x7fffec623700 (LWP 28016)]
[INFO] Request: tcp://127.0.0.1:6547
ADDR: 0X18ACCA0
ADDR: 0X1CD1AC0
...
ADDR: 0X18ACCA0
ADDR: 0X1CD1AC0
ADDR: 0X18A67B0

Program received signal SIGQUIT, Quit.
0x00007ffff7b19e30 in Perl_pp_nextstate () from /usr/lib64/perl5/CORE/libperl.so
(gdb) c
Continuing.
[Thread 0x7fffed024700 (LWP 28015) exited]
[Thread 0x7fffec623700 (LWP 28016) exited]
[INFO] worker has stopped.
[DEBUG] Testers: [
  {
    'Connected' => 1,
    'Address' => 'tcp://127.0.0.1:6547',
  }
]
[INFO] Stopped and terminated succesfully.

Program exited normally.
(gdb)

Fault case

[INFO] Initializing...
[New Thread 0x7fffed024700 (LWP 29319)]
[New Thread 0x7fffec623700 (LWP 29320)]
[INFO] Request to the monitor for registering: tcp://127.0.0.1:6547
ADDR: 0X18ACCA0
ADDR: 0X1CD1AC0
...
ADDR: 0X18ACCA0
ADDR: 0X1CD1AC0
ADDR: 0X18A67B0

Program received signal SIGQUIT, Quit.
0x00007ffff7b2ad20 in Perl_sv_clear () from /usr/lib64/perl5/CORE/libperl.so
(gdb) c
Continuing.
[Thread 0x7fffec623700 (LWP 29330) exited]
[Thread 0x7fffed024700 (LWP 29329) exited]
[INFO] worker has stopped.
ADDR: 0

Program received signal SIGSEGV, Segmentation fault.
0x00007fffee1fe68f in XS_ZMQ__LibZMQ3_zmq_poll (my_perl=<value optimized out>, cv=<value optimized out>) at xs/perl_libzmq3.xs:1071
1071                    pollitems[i].socket = ((P5ZMQ3_Socket *) mg->mg_ptr)->socket;

(gdb) bt
#0  0x00007fffee1fe68f in XS_ZMQ__LibZMQ3_zmq_poll (my_perl=<value optimized out>, cv=<value optimized out>) at xs/perl_libzmq3.xs:>
#1  0x00007ffff7b18815 in Perl_pp_entersub () from /usr/lib64/perl5/CORE/libperl.so
#2  0x00007ffff7b16b06 in Perl_runops_standard () from /usr/lib64/perl5/CORE/libperl.so
#3  0x00007ffff7abf0d8 in perl_run () from /usr/lib64/perl5/CORE/libperl.so
#4  0x0000000000400e74 in main (argc=2, argv=0x7fffffffe338, env=0x7fffffffe350) at perlmain.c:117

I'm guessing that is caused by calling zmq_poll() in a thread whereas calling zmq_close(), zmq_term() finished in another thread.

so I'm trying to solve that like that... :-)

I want you will have a nice day!

lestrrat commented 9 years ago

Just one nitpick: If we're ignoring weird cases, it might help to actually show the user that this happened as a warning?

potatogim commented 9 years ago

Oh, that it is true.

Frankly speaking, I don't know what is best way for notify that... :-]

Would you let me know how can we notify that be more cleanly?

lestrrat commented 9 years ago

see perldoc perlapi, where there's an entry for warn:

       warn    This is an XS interface to Perl's "warn" function.

               Take a sprintf-style format pattern and argument list.  These
               are used to generate a string message.  If the message does not
               end with a newline, then it will be extended with some
               indication of the current location in the code, as described
               for "mess_sv".

               The error message or object will by default be written to
               standard error, but this is subject to modification by a
               $SIG{__WARN__} handler.

               Unlike with "croak", "pat" is not permitted to be null.

                       void    warn(const char *pat, ...)

I haven't checked, but perhaps something like warn("Invalid socket %p found in zmq_poll(), ignoring", socket)?

potatogim commented 9 years ago

Wow, Thanks!

I think our users need to what socket of the pollitems in invalid.

as this URL(https://metacpan.org/pod/ZMQ::LibZMQ3#rv-zmq_poll-pollitems-timeout), zmq_poll() is known that it accepts two parameters \@pollitems, $timeout.

How does this look?

1070             if (mg->mg_ptr == NULL)
1071                 warn("Invalid socket found with pollitems[%i] in zmq_poll(), ignoring", i);
1072                 continue;
1073             }

Results

Invalid socket found with pollitems[0] in zmq_poll(), ignoring at ~~blahblah~~
Invalid socket found with pollitems[1] in zmq_poll(), ignoring at ~~blahblah~~
Invalid socket found with pollitems[2] in zmq_poll(), ignoring at ~~blahblah~~
lestrrat commented 9 years ago

lgtm. waiting for travis. I will be out soon, so will merge after I come back in a few hours.

potatogim commented 9 years ago

Yep. travis seems to be passed. doesn't it?

lestrrat commented 9 years ago

Sorry, this completely went unnoticed. :/