Closed sidrdesai closed 1 year ago
Thanks for the report of the issue @sidrdesai, if you have been able to catch the segmentation fault in gdb, could you share the backtrace please?
Core was generated by `./ros_pkgs/lib/nao_lola/nao_lola'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 malloc_consolidate (av=av@entry=0x7f62dd276b80 <main_arena>) at malloc.c:4475
4475 malloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7f62dcdcb640 (LWP 2965))]
(gdb) backtrace
#0 malloc_consolidate (av=av@entry=0x7f62dd276b80 <main_arena>) at malloc.c:4475
#1 0x00007f62dd121c83 in _int_malloc (av=av@entry=0x7f62dd276b80 <main_arena>, bytes=bytes@entry=1728)
at malloc.c:3699
#2 0x00007f62dd124299 in __GI___libc_malloc (bytes=1728) at malloc.c:3066
#3 0x00007f62dd341b39 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f62dd4daff6 in rcl_wait_set_resize () from /opt/ros/galactic/lib/librcl.so
#5 0x00007f62dd6473c0 in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /opt/ros/galactic/lib/librclcpp.so
#6 0x00007f62dd6478c3 in rclcpp::Executor::get_next_executable(rclcpp::AnyExecutable&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /opt/ros/galactic/lib/librclcpp.so
#7 0x00007f62dd64b0ad in rclcpp::executors::SingleThreadedExecutor::spin() ()
from /opt/ros/galactic/lib/librclcpp.so
#8 0x00007f62dd649c78 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) ()
from /opt/ros/galactic/lib/librclcpp.so
#9 0x00007f62dd649f20 in rclcpp::spin(std::shared_ptr<rclcpp::Node>) ()
from /opt/ros/galactic/lib/librclcpp.so
#10 0x0000560d6e80f869 in main (argc=1, argv=0x7ffdad482938)
at /home/sid/spl_ros/src/nao_lola/src/nao_lola_node.cpp:22
Hmm, I can't tell from that backtrack what the issues is. It does seem to be memory related, so my next step if I was you would be to run it through valgrind.
Just make sure, you are using the galactic branch right?
Just confirmed it is using the galactic branch. Still crashes. I'm not really experienced with using valgrind, but I will try to figure it out in the next couple days
Install valgrind with:
apt install valgrind
and inside your workspace (on the robot):
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Debug
source install/local_setup.bash
valgrind --tool=memcheck install/nao_lola/lib/nao_lola/nao_lola
if it detected memory issues, it'll print a bunch of stuff. Note that things will run about 100x slower while running with valgrind.
Here is the output of valgrind. Not sure exactly how to make sense of it.
==2617== Invalid read of size 8
==2617== at 0x486489C: _M_get (shared_ptr_base.h:1021)
==2617== by 0x486489C: operator-> (shared_ptr_base.h:1015)
==2617== by 0x486489C: MsgpackPacker::setJointPositions(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >) (msgpack_packer.cpp:102)
==2617== by 0x153E00: operator() (nao_lola.cpp:90)
==2617== by 0x153E00: std::_Function_handler<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >), NaoLola::createSubscriptions()::{lambda(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >&&) (std_function.h:300)
==2617== by 0x1B0618: operator() (std_function.h:688)
==2617== by 0x1B0618: operator()<std::function<void(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>&> (any_subscription_callback.hpp:255)
==2617== by 0x1B0618: __invoke_impl<void, rclcpp::AnySubscriptionCallback<MessageT, AllocatorT>::dispatch(std::shared_ptr<_Tp>, const rclcpp::MessageInfo&) [with MessageT = nao_command_msgs::msg::JointPositions_<std::allocator<void> >; AllocatorT = std::allocator<void>]::<lambda(auto:22&&)>, std::function<void(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>&> (invoke.h:60)
==2617== by 0x1B0618: __invoke<rclcpp::AnySubscriptionCallback<MessageT, AllocatorT>::dispatch(std::shared_ptr<_Tp>, const rclcpp::MessageInfo&) [with MessageT = nao_command_msgs::msg::JointPositions_<std::allocator<void> >; AllocatorT = std::allocator<void>]::<lambda(auto:22&&)>, std::function<void(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>&> (invoke.h:95)
==2617== by 0x1B0618: __visit_invoke_impl (variant:981)
==2617== by 0x1B0618: __do_visit_invoke (variant:989)
==2617== by 0x1B0618: std::__detail::__variant::__gen_vtable_impl<true, std::__detail::__variant::_Multi_array<void (*)(rclcpp::AnySubscriptionCallback<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::allocator<void> >::dispatch(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)::{lambda(auto:1&&)#1}&&, std::variant<std::function<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const&)>, std::variant<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const, rclcpp::MessageInfo const&)>, std::variant<void (std::unique_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > > >)>, std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<std::function>)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, rclcpp::MessageInfo const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> >&)>, std::tuple<std::variant<std::function<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const&)>, std::variant<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const, rclcpp::MessageInfo const&)>, std::variant<void (std::unique_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > > >)>, std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<std::function>)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, rclcpp::MessageInfo const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> > >, std::integer_sequence<unsigned long, 8ul> >::__visit_invoke(rclcpp::AnySubscriptionCallback<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::allocator<void> >::dispatch(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfoconst&)::{lambda(auto:1&&)#1}, std::variant<std::function<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const&)>, std::variant<void (nao_command_msgs::msg::JointPositions_<std::allocator<void> > const, rclcpp::MessageInfo const&)>, std::variant<void (std::unique_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > > >)>, std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<std::function>)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)>, rclcpp::MessageInfo const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const&)>, std::variant<void (std::variant<void (std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> const, rclcpp::MessageInfo const&)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>, std::variant<void (std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, rclcpp::MessageInfo const&)> >) (variant:1005)
==2617== by 0x21D52B: __do_visit<> (variant:1652)
==2617== by 0x21D52B: visit<rclcpp::AnySubscriptionCallback<MessageT, AllocatorT>::dispatch(std::shared_ptr<_Tp>, const rclcpp::MessageInfo&) [with MessageT = nao_command_msgs::msg::JointPositions_<std::allocator<void> >; AllocatorT = std::allocator<void>]::<lambda(auto:22&&)>, std::variant<std::function<void(const nao_command_msgs::msg::JointPositions_<std::allocator<void> >&)>, std::function<void(const nao_command_msgs::msg::JointPositions_<std::allocator<void> >&, const rclcpp::MessageInfo&)>, std::function<void(std::unique_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > > >)>, std::function<void(std::unique_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::default_delete<nao_command_msgs::msg::JointPositions_<std::allocator<void> > > >, const rclcpp::MessageInfo&)>, std::function<void(std::shared_ptr<const nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>, std::function<void(std::shared_ptr<const nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, const rclcpp::MessageInfo&)>, std::function<void(const std::shared_ptr<const nao_command_msgs::msg::JointPositions_<std::allocator<void> > >&)>, std::function<void(const std::shared_ptr<const nao_command_msgs::msg::JointPositions_<std::allocator<void> > >&, const rclcpp::MessageInfo&)>, std::function<void(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >)>, std::function<void(std::shared_ptr<nao_command_msgs::msg::JointPositions_<std::allocator<void> > >, const rclcpp::MessageInfo&)> >&> (variant:1663)
==2617== by 0x21D52B: dispatch (any_subscription_callback.hpp:238)
==2617== by 0x21D52B: rclcpp::Subscription<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::allocator<void>, rclcpp::message_memory_strategy::MessageMemoryStrategy<nao_command_msgs::msg::JointPositions_<std::allocator<void> >, std::allocator<void> > >::handle_message(std::shared_ptr<void>&, rclcpp::MessageInfo const&) (subscription.hpp:284)
==2617== by 0x4984FA3: ??? (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x4985873: rclcpp::Executor::execute_subscription(std::shared_ptr<rclcpp::SubscriptionBase>) (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x4985FA4: rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x498D0BB: rclcpp::executors::SingleThreadedExecutor::spin() (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x498BC77: rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x498BF1F: rclcpp::spin(std::shared_ptr<rclcpp::Node>) (in /opt/ros/galactic/lib/librclcpp.so)
==2617== by 0x14E9AE: main (nao_lola_node.cpp:22)
==2617== Address 0xbe22100 is 16 bytes inside a block of size 208 free'd
==2617== at 0x483CFBF: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2617== by 0x155634: _M_release (shared_ptr_base.h:171)
==2617== by 0x155634: _M_release (shared_ptr_base.h:148)
==2617== by 0x155634: ~__shared_count (shared_ptr_base.h:730)
==2617== by 0x155634: ~__shared_ptr (shared_ptr_base.h:1169)
==2617== by 0x155634: operator= (shared_ptr_base.h:1265)
==2617== by 0x155634: operator= (shared_ptr.h:335)
==2617== by 0x155634: NaoLola::NaoLola()::{lambda()#1}::operator()() const [clone .isra.0] (nao_lola.cpp:51)
==2617== by 0x4C50DE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==2617== by 0x4ADF608: start_thread (pthread_create.c:477)
==2617== by 0x4E96132: clone (clone.S:95)
==2617== Block was alloc'd at
==2617== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2617== by 0x154FF5: allocate (new_allocator.h:114)
==2617== by 0x154FF5: allocate (alloc_traits.h:443)
==2617== by 0x154FF5: __allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<MsgpackPacker, std::allocator<MsgpackPacker>, (__gnu_cxx::_Lock_policy)2> > > (allocated_ptr.h:97)
==2617== by 0x154FF5: __shared_count<MsgpackPacker, std::allocator<MsgpackPacker> > (shared_ptr_base.h:677)
==2617== by 0x154FF5: __shared_ptr<std::allocator<MsgpackPacker> > (shared_ptr_base.h:1344)
==2617== by 0x154FF5: shared_ptr<std::allocator<MsgpackPacker> > (shared_ptr.h:359)
==2617== by 0x154FF5: allocate_shared<MsgpackPacker, std::allocator<MsgpackPacker> > (shared_ptr.h:702)
==2617== by 0x154FF5: make_shared<MsgpackPacker> (shared_ptr.h:718)
==2617== by 0x154FF5: NaoLola::NaoLola()::{lambda()#1}::operator()() const [clone .isra.0] (nao_lola.cpp:51)
==2617== by 0x4C50DE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==2617== by 0x4ADF608: start_thread (pthread_create.c:477)
==2617== by 0x4E96132: clone (clone.S:95)
Thanks, this output is useful. I think there is a race condition that is happening. I am able to reproduce this problem. I will have a look at it and try and get a fix in.
When nao_lola is running on its own there is no issue, but when there is another node publishing joint commands, sometimes nao_lola will randomly segfault. The robot will run fine for 3 or 4 minutes initially before crashing. This becomes more frequent over time though until the robot is rebooted.
We are running nao_lola and the joint publisher node on the Nao V6 robot running Ubuntu 20.04 and ros2 galactic.