zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.62k stars 2.35k forks source link

Segfault near socket close using zmq_socket_monitor_pipes_stats #4524

Closed jhopahc closed 8 months ago

jhopahc commented 1 year ago

Issue description

Segfault involving: https://github.com/zeromq/libzmq/blob/bd6fa4bbb3ec775d6ff9df0e1bb3174254daffa4/src/object.cpp#L111 near a socket close.

Potentially related issue: https://github.com/zeromq/libzmq/issues/3446

Environment

In the logs, i replaced the address of segfaulting pointer with [SEGFAULT_POINTER] for readability, similar procedure for thread ids.

DOWNLOAD_LOGFILE snippet:

Thread 3 "[application-name]" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x[ZMQ_IO_THREAD] (LWP 1590)]
0x00000000 in ?? ()
(gdb) thread apply all bt

Thread 3 (Thread 0x[ZMQ_IO_THREAD] (LWP 1590)):
#0  0x00000000 in ?? ()
#1  0xb6702514 in zmq::object_t::process_command (this=0x[SEGFAULT_POINTER], cmd_=...) at libzmq/src/object.cpp:117

Please look at the end of attached file for complete stacktraces.

Assumed Problem

Race condition between CMD pipe_term_ack sent from zmq IO thread to application thread, and send_stats_to_peerexecuting in application thread (number in braces is line number in attached log fle) More precisely:

  1. (4202) Something sends pipe_term to application thread
  2. (4205) Application thread in process_term() sends pipe_term_ack to zmq IO thread
  3. (4211) IO thread executes pipe_term_ack, (4226) sending pipe_term_ack back to application thread, and
  4. (4229) destroys pipe object so that later access produces observed segfault.
  5. (4232) Application thread did not yet process (or got) pipe_term_ack command and so the segfault pointer is still listed as a peer for some pipe in the _pipes list that is looped over here:https://github.com/zeromq/libzmq/blob/3cafc0c26033e1d84ad9887ca4ddd99b934c311f/src/socket_base.cpp#L1602 Producing here https://github.com/zeromq/libzmq/blob/e86237da5845cbe0a7da626e7eb1071a76464ce2/src/pipe.cpp#L586 the request (4235) with the segfaulting pointer as destination
  6. (4244) IO thread begins processing command pipe_peer_stats, with invalid pointer as destination, producing segfault.

Note that the line numbers given in the log after the function name will be off because of insertion of log statements.

What's the expected result?

No segfault.

Note

If this is really a race condition, https://github.com/zeromq/libzmq/blob/3cafc0c26033e1d84ad9887ca4ddd99b934c311f/src/socket_base.cpp#L1609 can also potentially be vulnerable to this because it also loops over the currently registered pipes?

I am happy to provide additional info if needed.