luxonis / depthai-ros

Official ROS Driver for DepthAI Sensors.
MIT License
251 stars 185 forks source link

[BUG] Multiple cameras on same nodelet generate segfault/crash (ROS Noetic) #537

Closed imwhocodes closed 3 months ago

imwhocodes commented 4 months ago

Describe the bug If you try to load 2 cameras node on the same nodelet after some seconds ROS do crash

I saw multiple causes in the log, some of them are:

All error seem caused by some concurrency problem on managing allocation / deallocations

Minimal Reproducible Example My setup is based on this:

  1. docker-image: osrf/ros:noetic-desktop-full
  2. depthai-ros: apt install ros-noetic-depthai-ros

The minimum launch file to reproduce is:

<?xml version="1.0"?>
<launch>

  <node pkg="nodelet" type="nodelet" name="standalone_nodelet"  args="manager" output="screen"/>

  <group ns="OAKD_A">
    <node name="camera" pkg="nodelet" type="nodelet" output="screen" required="true" args="load depthai_ros_driver/Camera /standalone_nodelet">
      <rosparam param="camera_i_mx_id">SERIAL-CAMERA-A</rosparam>
    </node>
  </group>

  <group ns="OAKD_B">
    <node name="camera" pkg="nodelet" type="nodelet" output="screen" required="true" args="load depthai_ros_driver/Camera /standalone_nodelet">
      <rosparam param="camera_i_mx_id">SERIAL-CAMERA-B</rosparam>
    </node>
  </group>

</launch>

Expected behavior You should be able to attach/load multiple camera-nodelet to the same nodelet-manager

Additional context With nodelet (if attached to the same manager) you don't pay extra cpu price when moving data from a node to the next of a given pipeline This have a big impact when working with point-cloud (given they enormous message size) where from one node to the next you simply move a smart-pointer around without serialising and deserialising the data at each step In my setup I'm acquiring multiple point cloud from multiple oak-d-pro-w, merging them and applying multiple steps of processing, being unable to exploit the memory sharing of the nodelet architecture is a big performance hit

Example Log

SUMMARY
========

PARAMETERS
 * /OAKD_A/camera/camera_i_mx_id: 18443010C1EE870E00
 * /OAKD_B/camera/camera_i_mx_id: 18443010510E990F00
 * /rosdistro: noetic
 * /rosversion: 1.16.0

NODES
  / 
    standalone_nodelet (nodelet/nodelet)
  /OAKD_A/
    camera (nodelet/nodelet)
  /OAKD_B/
    camera (nodelet/nodelet)

auto-starting new master
process[master]: started with pid [7914]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 67cedd3a-1e02-11ef-8220-5847ca74d8a4
process[rosout-1]: started with pid [7924]
started core service [/rosout]
process[standalone_nodelet-2]: started with pid [7927]
process[OAKD_A/camera-3]: started with pid [7929]
process[OAKD_B/camera-4]: started with pid [7933]
[ INFO] [1717018115.581428199]: Loading nodelet /OAKD_A/camera of type depthai_ros_driver/Camera to manager /standalone_nodelet with the following remappings:
[ INFO] [1717018115.582056329]: waitForService: Service [/standalone_nodelet/load_nodelet] has not been advertised, waiting...
[ INFO] [1717018115.584478326]: Loading nodelet /OAKD_B/camera of type depthai_ros_driver/Camera to manager /standalone_nodelet with the following remappings:
[ INFO] [1717018115.585082510]: waitForService: Service [/standalone_nodelet/load_nodelet] has not been advertised, waiting...
[ INFO] [1717018115.592682811]: Initializing nodelet with 16 worker threads.
[ INFO] [1717018115.602624424]: waitForService: Service [/standalone_nodelet/load_nodelet] is now available.
[ INFO] [1717018115.605738163]: waitForService: Service [/standalone_nodelet/load_nodelet] is now available.
[ INFO] [1717018116.167786814]: Connecting to the camera using mxid: 18443010C1EE870E00
[ INFO] [1717018117.707903012]: Ignoring device info: MXID: 18443010510E990F00, Name: 1.2.1
[ INFO] [1717018117.708754499]: Camera with MXID: 18443010C1EE870E00 and Name: 1.1.1 connected!
[ INFO] [1717018117.709535671]: USB SPEED: SUPER
[ INFO] [1717018117.755400465]: Device type: OAK-D-PRO-W
[ INFO] [1717018117.758074694]: Pipeline type: RGBD
[ INFO] [1717018117.992546300]: NN Family: mobilenet
[ INFO] [1717018118.021149807]: NN input size: 300 x 300. Resizing input image in case of different dimensions.
[ INFO] [1717018118.270776315]: Finished setting up pipeline.
[ INFO] [1717018119.312776300]: Camera ready!
[2024-05-29 21:28:40.147] [depthai] [warning] skipping X_LINK_UNBOOTED device having name "1.1.1"
[ INFO] [1717018120.148440016]: Connecting to the camera using mxid: 18443010510E990F00
[ INFO] [1717018121.656517417]: Camera with MXID: 18443010510E990F00 and Name: 1.2.1 connected!
[ INFO] [1717018121.657258593]: USB SPEED: SUPER
[ INFO] [1717018121.702915600]: Device type: OAK-D-PRO-W
[ INFO] [1717018121.705443560]: Pipeline type: RGBD
[ INFO] [1717018121.939830232]: NN Family: mobilenet
[ INFO] [1717018121.966727108]: NN input size: 300 x 300. Resizing input image in case of different dimensions.
[ INFO] [1717018122.207234174]: Finished setting up pipeline.
[ INFO] [1717018123.098100489]: Camera ready!
free(): double free detected in tcache 2
[standalone_nodelet-2] process has died [pid 7927, exit code -6, cmd /opt/ros/noetic/lib/nodelet/nodelet manager __name:=standalone_nodelet __log:=/root/.ros/log/67cedd3a-1e02-11ef-8220-5847ca74d8a4/standalone_nodelet-2.log].
log file: /root/.ros/log/67cedd3a-1e02-11ef-8220-5847ca74d8a4/standalone_nodelet-2*.log
[ INFO] [1717018135.419087792]: Bond broken, exiting
================================================================================REQUIRED process [OAKD_B/camera-4] has died!
process has finished cleanly
log file: /root/.ros/log/67cedd3a-1e02-11ef-8220-5847ca74d8a4/OAKD_B-camera-4*.log
Initiating shutdown!
================================================================================
[OAKD_B/camera-4] killing on exit
[OAKD_A/camera-3] killing on exit
[ INFO] [1717018135.686666723]: Unloading nodelet /OAKD_A/camera from manager /standalone_nodelet
[ INFO] [1717018135.687410985]: waitForService: Service [/standalone_nodelet/unload_nodelet] could not connect to host [DetectionTest:40787], waiting...
[ WARN] [1717018135.687424731]: Couldn't find service /standalone_nodelet/unload_nodelet, perhaps the manager is already shut down
[rosout-1] killing on exit
[master] killing on exit
shutting down processing monitor...
... shutting down processing monitor complete
done

Thanks, Luca

imwhocodes commented 4 months ago

I saw on the forum that the call to the various call to dai::Device and dai::Pipeline are not thread-safe

So I tried (very crudely) to mitigate it by getting sharing a lock between all Camera nodelet and by acquiring it for ANY function or callback present in Camera.cpp

As seen here: https://github.com/luxonis/depthai-ros/compare/noetic...imwhocodes:depthai-ros:noetix-multicam-nodelet-fix

But either I'm doing something wrong or this don't help either The nodelet-manager is still crashing after a couple of second of having both camera connected, with some new error sometimes

MartinMotycka commented 3 months ago

@Serafadam Please check this asap. Thank you. M.

Serafadam commented 3 months ago

@imwhocodes Thanks for the report, it seems that the issue probably originates from IMU interpolation, you can switch try checking if changing sync method to COPY helps in meantime.

        <rosparam param="imu_i_sync_method">COPY</rosparam>
imwhocodes commented 3 months ago

@imwhocodes Thanks for the report, it seems that the issue probably originates from IMU interpolation, you can switch try checking if changing sync method to COPY helps in meantime.

        <rosparam param="imu_i_sync_method">COPY</rosparam>

Thanks, this worked out!