zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.75k stars 2.36k forks source link

XSub socket crashes with segmentation fault on a dead pipe upon reconnect to XPub socket #1833

Closed kikosha closed 8 years ago

kikosha commented 8 years ago

I'm updating this issue as I examined behaviour of master branch.

If after working for some time an XSub socket is disconnected from the endpoint (XPub socket) and then connected (using the API) to the XPub socket again (in case, say of XPub was restart), the XSub crashes upon the first attempt to send a message through it or during the reconnect itself. It should be noted, that in the first case it succeeds to send the automatic re-subscription messages (at least part of them) on subscriptions, established prior to reconnect. Also, this seems to happen only when I have at least a certain number of subscriptions and does not happen if that number is lower than that threshold. It also does not happen if the socket is left untouched (letting the internal reconnect mechanism to work).

When the process crashes during the first attempt to send a message, it then produces a core with the following stacktrace (note the index1_ variable value):

Program terminated with signal 11, Segmentation fault.
#0  swap (index2_=1, index1_=1044266591, this=0x1e90fb0) at src/array.hpp:134

134                 if (items [index1_])
(gdb) bt
#0  swap (index2_=1, index1_=1044266591, this=0x1e90fb0) at src/array.hpp:134
#1  zmq::dist_t::write (this=0x1e90fb0, pipe_=0x7f1d0401c7b0, msg_=0x7f1d040100b1) at src/dist.cpp:211
#2  0x00007f1d1837a2c3 in zmq::dist_t::distribute (this=0x1e90fb0, msg_=0x7f1d040100b1) at src/dist.cpp:174
#3  0x00007f1d1837a4a1 in zmq::dist_t::send_to_matching (this=0x1e90fb0, msg_=0x7f1d040100b1) at src/dist.cpp:150
#4  0x00007f1d18397e7f in zmq::socket_base_t::send (this=this@entry=0x1e90ae0, msg_=msg_@entry=0x7f1d040100b1, flags_=flags_@entry=1)
    at src/socket_base.cpp:1108
#5  0x00007f1d183b2de9 in s_sendmsg (flags_=1, msg_=0x7f1d040100b1, s_=0x1e90ae0) at src/zmq.cpp:373
#6  zmq_msg_send (msg_=0x7f1d040100b1, s_=0x1e90ae0, flags_=1) at src/zmq.cpp:630
#7  0x00007f1d1814d377 in zmqpp::socket::send(zmqpp::message&, bool) () from /usr/lib64/libzmqpp.so.4
#8  0x00007f1d188e374e in ******************
#9  0x00007f1d18142778 in zmqpp::reactor::poll(long) () from /usr/lib64/libzmqpp.so.4
#10 0x00007f1d188e0d50 in ******************
#11 0x00007f1d15ba4470 in ?? () from /usr/lib64/libstdc++.so.6
#12 0x00007f1d16e7b9d1 in start_thread (arg=0x7f1d10bc3700) at pthread_create.c:301
#13 0x00007f1d153a88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

When frame 1 one is examined, it can be seen that the pipe_ object is invalid (note the Address out of bounds, vtable is corrupted):

(gdb) f 1
#1  zmq::dist_t::write (this=0x1e90fb0, pipe_=0x7f1d0401c7b0, msg_=0x7f1d040100b1) at src/dist.cpp:211

211             pipes.swap (pipes.index (pipe_), matching - 1);
(gdb) p *pipe_
$1 = {
  <zmq::object_t> = {
    _vptr.object_t = 0x7f1d04000078, 
    ctx = 0x7f1d04009230, 
    tid = 4294967295
  }, 
  <zmq::array_item_t<1>> = {
    _vptr.array_item_t = 0x5c2f5c2f5c2f5c2f, 
    array_index = 1398758447
  }, 
  <zmq::array_item_t<2>> = {
    _vptr.array_item_t = 0x4d41455254535055, 
    array_index = 1044266591
  }, 
  <zmq::array_item_t<3>> = {
    _vptr.array_item_t = 0x73654d090a3e3e3e, 
    array_index = 1701273971
  }, 
  members of zmq::pipe_t: 
  inpipe = 0x4c5f544547203a65, 
  outpipe = 0x726556090a545341, 
  in_active = 115, 
  out_active = 105, 
  hwm = 170991674, 
  lwm = 1937064713, 
  inhwmboost = 1701670772, 
  outhwmboost = 1145643122, 
  msgs_read = 4412750543119331377, 
  msgs_written = 4412750543122677053, 
  peers_msgs_read = 4412750543122677053, 
  peer = 0x3d3d3d3d3d3d3d3d, 
  sink = 0x3d3d3d3d3d3d3d3d, 
  state = 1027423549, 
  delay = 61, 
  identity = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x3d3d3d3d3d3d3d3d <Address 0x3d3d3d3d3d3d3d3d out of bounds>
    }
  }, 
  routing_id = 1027423549, 
  credential = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x3d3d3d3d3d3d3d3d <Address 0x3d3d3d3d3d3d3d3d out of bounds>
    }
  }, 
  conflate = 61
}

And this produces the insane index value

<zmq::array_item_t<2>> = {
    _vptr.array_item_t = 0x31312a0a0d32330a, 
    array_index = 874777101
  }, 

Examination of 'this' shows that:

(gdb) p *this
$2 = {
  pipes = {
    items = {
      <std::_Vector_base<zmq::pipe_t*, std::allocator<zmq::pipe_t*> >> = {
        _M_impl = {
          <std::allocator<zmq::pipe_t*>> = {
            <__gnu_cxx::new_allocator<zmq::pipe_t*>> = {<No data fields>}, <No data fields>}, 
          members of std::_Vector_base<zmq::pipe_t*, std::allocator<zmq::pipe_t*> >::_Vector_impl: 
          _M_start = 0x7f1d04024960, 
          _M_finish = 0x7f1d04024968, 
          _M_end_of_storage = 0x7f1d04024970
        }
      }, <No data fields>}
  }, 
  matching = 2, 
  active = 2, 
  eligible = 2, 
  more = false
}

The contents of vector (of zmq::array_t):

(gdb) f 0
#0  swap (index2_=1, index1_=1044266591, this=0x1e90fb0) at src/array.hpp:134

(gdb) pvector this->items
elem[0]: $5 = (zmq::pipe_t *) 0x7f1d0401c7b0
Vector size = 1
Vector capacity = 2
Element type = std::_Vector_base<zmq::pipe_t*, std::allocator<zmq::pipe_t*> >::pointer

(gdb) p *$5
$7 = {
  <zmq::object_t> = {
    _vptr.object_t = 0x7f1d04000078, 
    ctx = 0x7f1d04009230, 
    tid = 4294967295
  }, 
  <zmq::array_item_t<1>> = {
    _vptr.array_item_t = 0x5c2f5c2f5c2f5c2f, 
    array_index = 1398758447
  }, 
  <zmq::array_item_t<2>> = {
    _vptr.array_item_t = 0x4d41455254535055, 
    array_index = 1044266591
  }, 
  <zmq::array_item_t<3>> = {
    _vptr.array_item_t = 0x73654d090a3e3e3e, 
    array_index = 1701273971
  }, 
  members of zmq::pipe_t: 
  inpipe = 0x4c5f544547203a65, 
  outpipe = 0x726556090a545341, 
  in_active = 115, 
  out_active = 105, 
  hwm = 170991674, 
  lwm = 1937064713, 
  inhwmboost = 1701670772, 
  outhwmboost = 1145643122, 
  msgs_read = 4412750543119331377, 
  msgs_written = 4412750543122677053, 
  peers_msgs_read = 4412750543122677053, 
  peer = 0x3d3d3d3d3d3d3d3d, 
  sink = 0x3d3d3d3d3d3d3d3d, 
  state = 1027423549, 
  delay = 61, 
  identity = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x3d3d3d3d3d3d3d3d <Address 0x3d3d3d3d3d3d3d3d out of bounds>
    }
  }, 
  routing_id = 1027423549, 
  credential = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x3d3d3d3d3d3d3d3d <Address 0x3d3d3d3d3d3d3d3d out of bounds>
    }
  }, 
  conflate = 61
}

I guess its trying to write through a destroyed pipe object for some reason.

When the process crashes during the internal reconnect procedure, it generates a different stacktrace, in this case the 'index1_' variable is problematic, since the vector contains only 2 items:

Program terminated with signal 11, Segmentation fault.
#0  swap (index2_=1, index1_=2, this=0x1554fb0) at src/array.hpp:135

135                     ((item_t*) items [index1_])->set_array_index ((int) index2_);

(gdb) bt
#0  swap (index2_=1, index1_=2, this=0x1554fb0) at src/array.hpp:135
#1  zmq::dist_t::attach (this=this@entry=0x1554fb0, pipe_=pipe_@entry=0x7fb560010af0) at src/dist.cpp:62
#2  0x00007fb5745de7d7 in zmq::xsub_t::xattach_pipe (this=0x1554ae0, pipe_=0x7fb560010af0, subscribe_to_all_=<optimized out>)
    at src/xsub.cpp:64
#3  0x00007fb5745c6de3 in zmq::socket_base_t::attach_pipe (this=this@entry=0x1554ae0, pipe_=0x7fb560010af0, 
    subscribe_to_all_=subscribe_to_all_@entry=false) at src/socket_base.cpp:341
#4  0x00007fb5745c775c in zmq::socket_base_t::connect (this=0x1554ae0, addr_=0x154ba28 "tcp:/********:5570")
    at src/socket_base.cpp:978
#5  0x00007fb57437a1f1 in zmqpp::socket::connect(std::string const&) () from /usr/lib64/libzmqpp.so.4
#6  0x00007fb574b0e042 in *****************
#7  0x00007fb571dd1470 in ?? () from /usr/lib64/libstdc++.so.6
#8  0x00007fb5730a89d1 in start_thread (arg=0x7fb56cdfc700) at pthread_create.c:301
#9  0x00007fb5715d58fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(gdb) f 0
#0  swap (index2_=1, index1_=2, this=0x1554fb0) at src/array.hpp:135

135                     ((item_t*) items [index1_])->set_array_index ((int) index2_);
(gdb) pvector this->items
elem[0]: $5 = (zmq::pipe_t *) 0x15483b0
elem[1]: $6 = (zmq::pipe_t *) 0x7fb560010af0
Vector size = 2
Vector capacity = 2
Element type = std::_Vector_base<zmq::pipe_t*, std::allocator<zmq::pipe_t*> >::pointer

(gdb) p *$5
$14 = {
  <zmq::object_t> = {
    _vptr.object_t = 0x656400736363, 
    ctx = 0x21, 
    tid = 1904725736
  }, 
  <zmq::array_item_t<1>> = {
    _vptr.array_item_t = 0x156cfb0, 
    array_index = 32
  }, 
  <zmq::array_item_t<2>> = {
    _vptr.array_item_t = 0x41, 
    array_index = 0
  }, 
  <zmq::array_item_t<3>> = {
    _vptr.array_item_t = 0x17, 
    array_index = 0
  }, 
  members of zmq::pipe_t: 
  inpipe = 0x676f6c2f74706f2f, 
  outpipe = 0x6464694d7363632f, 
  in_active = 108, 
  out_active = 101, 
  hwm = 6645859, 
  lwm = 0, 
  inhwmboost = 0, 
  outhwmboost = 49, 
  msgs_read = 2, 
  msgs_written = 22306496, 
  peers_msgs_read = 32, 
  peer = 0x1545fd0, 
  sink = 0x1544190, 
  state = 49, 
  delay = false, 
  identity = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x7fb574d7c490 <vtable for boost::detail::sp_counted_impl_pd<boost::thread_specific_ptr<boost::log::v2_mt_posix::sinks::basic_formatting_sink_frontend<char>::formatting_context>::delete_data*, boost::detail::do_heap_delete<boost::thread_specific_ptr<boost::log::v2_mt_posix::sinks::basic_formatting_sink_frontend<char>::formatting_context>::delete_data> >+16> "\340\060\255t\265\177"
    }
  }, 
  routing_id = 14, 
  credential = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x1547490 "\360\304\327t\265\177"
    }
  }, 
  conflate = false
}

(gdb) p *$6
$13 = (zmq::pipe_t) {
  <zmq::object_t> = {
    _vptr.object_t = 0x7fb5747fc510 <vtable for zmq::pipe_t+16>, 
    ctx = 0x1542490, 
    tid = 7
  }, 
  <zmq::array_item_t<1>> = {
    _vptr.array_item_t = 0x7fb5747fc5b0 <vtable for zmq::pipe_t+176>, 
    array_index = 0
  }, 
  <zmq::array_item_t<2>> = {
    _vptr.array_item_t = 0x7fb5747fc5d0 <vtable for zmq::pipe_t+208>, 
    array_index = 1
  }, 
  <zmq::array_item_t<3>> = {
    _vptr.array_item_t = 0x7fb5747fc5f0 <vtable for zmq::pipe_t+240>, 
    array_index = 1
  }, 
  members of zmq::pipe_t: 
  inpipe = 0x7fb560010080, 
  outpipe = 0x7fb56001aa80, 
  in_active = true, 
  out_active = true, 
  hwm = 1000, 
  lwm = 500, 
  inhwmboost = 0, 
  outhwmboost = 0, 
  msgs_read = 0, 
  msgs_written = 0, 
  peers_msgs_read = 0, 
  peer = 0x7fb560010f30, 
  sink = 0x1554e48, 
  state = zmq::pipe_t::active, 
  delay = true, 
  identity = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x7fb5747ffe78 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+24> ""
    }
  }, 
  routing_id = 0, 
  credential = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
      <std::allocator<unsigned char>> = {
        <__gnu_cxx::new_allocator<unsigned char>> = {<No data fields>}, <No data fields>}, 
      members of std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Alloc_hider: 
      _M_p = 0x7fb5747ffe78 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+24> ""
    }
  }, 
  conflate = false
}

(gdb) f 1
#1  zmq::dist_t::attach (this=this@entry=0x1554fb0, pipe_=pipe_@entry=0x7fb560010af0) at src/dist.cpp:62

62              pipes.swap (active, pipes.size () - 1);

(gdb) l
57              pipes.swap (eligible, pipes.size () - 1);
58              eligible++;
59          }
60          else {
61              pipes.push_back (pipe_);
62              pipes.swap (active, pipes.size () - 1);
63              active++;
64              eligible++;
65          }
66      }
(gdb) p active
$15 = 2
(gdb) p eligible
$16 = 2

It looks like in this case the index of the new pipe in the array is by one off the correct value.

Unfortunately, due to company policy I'm restricted from sharing the core dump file but I'll gladly extract any additional information from it if necessary

hitstergtd commented 8 years ago

Any takers? @somdoron?

somdoron commented 8 years ago

I will take a look