ros / ros_comm

ROS communications-related packages, including core client libraries (roscpp, rospy, roslisp) and graph introspection tools (rostopic, rosnode, rosservice, rosparam).
http://wiki.ros.org/ros_comm
762 stars 911 forks source link

roscpp crash in ros::PublisherLink::setHeader() #2032

Open caijimin opened 4 years ago

caijimin commented 4 years ago

We use ros-kinetic and witnessed a few crashes occasionally. It’s hard to reproduce and I didn’t have reproducible test case.

#0  0x00007fc0be68f428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fc0be69102a in __GI_abort () at abort.c:89
#2  0x00007fc0befd284d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fc0befd06b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fc0befd0701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fc0befd0919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fc0c318e1a9 in void boost::throw_exception<boost::bad_weak_ptr>(boost::bad_weak_ptr const&) ()
   from /opt/raccoon/lib/libroscpp.so
#7  0x00007fc0c318c9f2 in ros::PublisherLink::setHeader(ros::Header const&) ()
   from /opt/raccoon/lib/libroscpp.so
#8  0x00007fc0c3219185 in ros::TransportPublisherLink::onHeaderReceived(boost::shared_ptr<ros::Connection> const&, ros::Header const&) () from /opt/raccoon/lib/libroscpp.so
#9  0x00007fc0c319846b in ros::Connection::onHeaderRead(boost::shared_ptr<ros::Connection> const&, boost::shared_array<unsigned char> const&, unsigned int, bool) () from /opt/raccoon/lib/libroscpp.so
#10 0x00007fc0c3194c13 in ros::Connection::readTransport() () from /opt/raccoon/lib/libroscpp.so
#11 0x00007fc0c3213c3a in ros::TransportTCP::socketUpdate(int) () from /opt/raccoon/lib/libroscpp.so
#12 0x00007fc0c3251c60 in ros::PollSet::update(int) () from /opt/raccoon/lib/libroscpp.so
#13 0x00007fc0c31d2625 in ros::PollManager::threadFunc() () from /opt/raccoon/lib/libroscpp.so
#14 0x00007fc0c22265d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#15 0x00007fc0c1dee6ba in start_thread (arg=0x7fc0b70bc700) at pthread_create.c:333
#16 0x00007fc0be76141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

From disassemble code

(gdb) disass 0x00007fc0c318c9f2
   0x00007fc0c318c9c6 <+950>:   lock cmpxchg %ecx,0x8(%rdx)
   0x00007fc0c318c9cb <+955>:   jne    0x7fc0c318c9be <_ZN3ros13PublisherLink9setHeaderERKNS_6HeaderE+942>
   0x00007fc0c318c9cd <+957>:   test   %eax,%eax
   0x00007fc0c318c9cf <+959>:   jne    0x7fc0c318c9f8 <_ZN3ros13PublisherLink9setHeaderERKNS_6HeaderE+1000>
   0x00007fc0c318c9d1 <+961>:   mov    0x30c360(%rip),%rax        # 0x7fc0c3498d38
   0x00007fc0c318c9d8 <+968>:   lea    -0xe0(%rbp),%r14
   0x00007fc0c318c9df <+975>:   mov    %r14,%rdi
   0x00007fc0c318c9e2 <+978>:   lea    0x10(%rax),%r15
   0x00007fc0c318c9e6 <+982>:   mov    %r15,-0xe0(%rbp)
   0x00007fc0c318c9ed <+989>:   callq  0x7fc0c317b7a0 <_ZN5boost15throw_exceptionINS_12bad_weak_ptrEEEvRKT_@plt>
=> 0x00007fc0c318c9f2 <+994>:   nopw   0x0(%rax,%rax,1)
   0x00007fc0c318c9f8 <+1000>:  mov    0x8(%r15),%rax

Seems it crash in publisher_link.cpp:96 share_from_this()

 63 bool PublisherLink::setHeader(const Header& header)
 64 {
 65   header.getValue("callerid", caller_id_);
 66 
 67   std::string md5sum, type, latched_str;
 68   if (!header.getValue("md5sum", md5sum))
 69   {
 70     ROS_ERROR("Publisher header did not have required element: md5sum");
 71     return false;
 72   }
 73 
 74   md5sum_ = md5sum;
 75 
 76   if (!header.getValue("type", type))
 77   {
 78     ROS_ERROR("Publisher header did not have required element: type");
 79     return false;
 80   }
 81 
 82   latched_ = false;
 83   if (header.getValue("latching", latched_str))
 84   {
 85     if (latched_str == "1")
 86     {
 87       latched_ = true;
 88     }
 89   }
 90 
 91   connection_id_ = ConnectionManager::instance()->getNewConnectionID();
 92   header_ = header;
 93 
 94   if (SubscriptionPtr parent = parent_.lock())
 95   {
 96     parent->headerReceived(shared_from_this(), header);
 97   }
 98 
 99   return true;
100 }

Had anybody seen something like that before? Thanks in advance.

caijimin commented 4 years ago

Seems TransportPublisherLink->parent_ is bad_weak_ptr now.

(gdb) x/16w $r15
0x7fc0c34938c8 <_ZTVN5boost12bad_weak_ptrE+16>: 0xc318d260  0x00007fc0  0xc318d280  0x00007fc0
0x7fc0c34938d8 <_ZTVN5boost12bad_weak_ptrE+32>: 0xc318d250  0x00007fc0  0x00000000  0x00000000 <--- 
 parent_->px
0x7fc0c34938e8 <_ZTVN3ros13PublisherLinkE+8>:   0xc3493820  0x00007fc0  0x00000000  0x00000000
0x7fc0c34938f8 <_ZTVN3ros13PublisherLinkE+24>:  0x00000000  0x00000000  0x00e94cb0  0x00000000
dshwtc commented 3 years ago

We met the same problem occasionally. Has this been solved? The crashed node is publishing tf, when another node that subscribe the tf is just closing.