Segmentation fault in libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so

daisukes commented 4 years ago

Bug report

It looks like something wrong in deserializing LaserScan message once per thousands. It happens in 30 secs to several mins from launch randomly and /scan message is published in 10hz. I have been working on my ROS1 system with ROS2 navigation. ROS1+Gazebo simulate Velodyne lidar and convert it to LaserSacn and transfer it to ROS2 via a bridge.

I'm not sure that this is a bug here or my configuration with the bridge. Could you help me to fix this?

Required Info:

Ubuntu 18.04, ROS melodic (binary) + Gazebo 9.13
ROS1 bridge with the Docker image osrf/ros:eloquent-ros1-bridge
ROS2 foxy source build (https://github.com/ros2/ros2/commit/732abee4bee90d352088c4b1a29af68eb2897ad3)
DDS implementation:
- Fast-RTPS
Client library (if applicable):
- Navigation 2 master source build (433c6ac72685920c70f5b9cb84e349962239d48b)

Steps to reproduce issue

It is difficult because my project is not public yet, but here is the backtrace of the process. This could happen with the process which subscribing to the /scan message.

Reading symbols from /opt/overlay_ws/install/nav2_planner/lib/nav2_planner/planner_server...
(No debugging symbols found in /opt/overlay_ws/install/nav2_planner/lib/nav2_planner/planner_server)
[New LWP 640]
[New LWP 572]
[New LWP 567]
[New LWP 545]
[New LWP 583]
[New LWP 669]
[New LWP 602]
[New LWP 597]
[New LWP 657]
[New LWP 644]
[New LWP 581]
[New LWP 634]
[New LWP 578]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/overlay_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f68d939eba4 in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f68cf7fe700 (LWP 640))]
(gdb) bt
#0  0x00007f68d939eba4 in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f68d93a0e03 in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f68d93a3419 in malloc () from /usr/lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f68d95bdc29 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f68d44c87fc in __gnu_cxx::new_allocator<float>::allocate(unsigned long, void const*) () from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#5  0x00007f68d44c85fc in std::allocator_traits<std::allocator<float> >::allocate(std::allocator<float>&, unsigned long) () from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#6  0x00007f68d44c83d6 in std::_Vector_base<float, std::allocator<float> >::_M_allocate(unsigned long) () from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#7  0x00007f68d44c7f9e in std::vector<float, std::allocator<float> >::_M_default_append(unsigned long) () from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#8  0x00007f68d44c7deb in std::vector<float, std::allocator<float> >::resize(unsigned long) () from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#9  0x00007f68d44c7cbf in eprosima::fastcdr::Cdr& eprosima::fastcdr::Cdr::deserialize<float>(std::vector<float, std::allocator<float> >&) ()
   from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#10 0x00007f68d44c7b25 in eprosima::fastcdr::Cdr& eprosima::fastcdr::Cdr::operator>><float>(std::vector<float, std::allocator<float> >&) ()
   from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#11 0x00007f68d44d1c77 in sensor_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, sensor_msgs::msg::LaserScan_<std::allocator<void> >&) ()
   from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#12 0x00007f68d44d220e in sensor_msgs::msg::typesupport_fastrtps_cpp::_LaserScan__cdr_deserialize(eprosima::fastcdr::Cdr&, void*) ()
   from /opt/ros/foxy/sensor_msgs/lib/libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so
#13 0x00007f68d88f6a6b in rmw_fastrtps_cpp::TypeSupport::deserializeROSmessage(eprosima::fastcdr::Cdr&, void*, void const*) const () from /opt/ros/foxy/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
#14 0x00007f68d8873e48 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) () from /opt/ros/foxy/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#15 0x00007f68d83b8571 in eprosima::fastrtps::SubscriberHistory::deserialize_change(eprosima::fastrtps::rtps::CacheChange_t*, unsigned int, void*, eprosima::fastrtps::SampleInfo_t*) ()
   from /opt/ros/foxy/fastrtps/lib/libfastrtps.so.2
#16 0x00007f68d83bd1e8 in eprosima::fastrtps::SubscriberHistory::takeNextData(void*, eprosima::fastrtps::SampleInfo_t*, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&) () from /opt/ros/foxy/fastrtps/lib/libfastrtps.so.2
#17 0x00007f68d83b1da0 in eprosima::fastrtps::SubscriberImpl::takeNextData(void*, eprosima::fastrtps::SampleInfo_t*) () from /opt/ros/foxy/fastrtps/lib/libfastrtps.so.2
#18 0x00007f68d886b3f2 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_t const*, void*, bool*, rmw_message_info_t*, rmw_subscription_allocation_t*) ()
   from /opt/ros/foxy/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#19 0x00007f68d886bb7f in rmw_fastrtps_shared_cpp::__rmw_take_with_info(char const*, rmw_subscription_t const*, void*, bool*, rmw_message_info_t*, rmw_subscription_allocation_t*) ()
   from /opt/ros/foxy/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#20 0x00007f68d88f3d5e in rmw_take_with_info () from /opt/ros/foxy/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
#21 0x00007f68d8f97133 in rmw_take_with_info () from /opt/ros/foxy/rmw_implementation/lib/librmw_implementation.so
#22 0x00007f68d901f771 in rcl_take () from /opt/ros/foxy/rcl/lib/librcl.so
#23 0x00007f68d9da27ef in rclcpp::SubscriptionBase::take_type_erased(void*, rclcpp::MessageInfo&) () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#24 0x00007f68d9c9a31d in rclcpp::Executor::execute_subscription(std::shared_ptr<rclcpp::SubscriptionBase>)::{lambda()#5}::operator()() const () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#25 0x00007f68d9c9d07d in std::_Function_handler<bool (), rclcpp::Executor::execute_subscription(std::shared_ptr<rclcpp::SubscriptionBase>)::{lambda()#5}>::_M_invoke(std::_Any_data const&) ()
   from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#26 0x00007f68d9c9f822 in std::function<bool ()>::operator()() const () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#27 0x00007f68d9c999f8 in take_and_do_error_handling(char const*, char const*, std::function<bool ()>, std::function<void ()>) () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#28 0x00007f68d9c9aa34 in rclcpp::Executor::execute_subscription(std::shared_ptr<rclcpp::SubscriptionBase>) () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#29 0x00007f68d9c99763 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#30 0x00007f68d9ca7176 in rclcpp::executors::SingleThreadedExecutor::spin() () from /opt/ros/foxy/rclcpp/lib/librclcpp.so
#31 0x00007f68d91147cd in std::thread::_State_impl<std::thread::_Invoker<std::tuple<nav2_util::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}> > >::_M_run() ()
   from /opt/overlay_ws/install/nav2_util/lib/libnav2_util_core.so
#32 0x00007f68d95e9cb4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#33 0x00007f68d96ff609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#34 0x00007f68d9428103 in clone () from /usr/lib/x86_64-linux-gnu/libc.so.6

dirk-thomas commented 4 years ago

Since it segfaults in malloc could it be that you are running out of memory when this happens?

daisukes commented 4 years ago

No. I tried it again. I have 24GB memory and the system is using only 7GB in total when I got the error again. I'm running the ROS2 (Ubuntu20.04) as a docker container on a Ubuntu16.04 host. Do you think it could be a problem? I didn't set any memory limits for docker containers.

daisukes commented 4 years ago

@dirk-thomas Thanks to the hint, I might fix the issue by increasing the stack size. It has been running over 20 minutes so far. Does it make sense to you?

$ ulimit -s
8192
$ ulimit -s 65536

dirk-thomas commented 4 years ago

Since 8192 is the default on Ubuntu and with that configuration it works for many developers, users as well as our CI infrastructure I don't think it should be necessary.

A debug build and using gdb to look into how much memory the failing invocation tries to allocate might help to narrow down the problem.

daisukes commented 4 years ago

I tried a debug build. Then somehow, I could not reproduce the segmentation fault both debug and release build.

While building, my system used all disks and my docker environment was broken. So I had to clean all images and build from scratch. My docker images might be something wrong.

If I get the same error again, I will be back here, but this issue can be closed I think. Thank you for your help!

ros2 / rosidl

Segmentation fault in libsensor_msgs__rosidl_typesupport_fastrtps_cpp.so #490

Bug report

Steps to reproduce issue