plusone-robotics / realsense

Intel(R) RealSense(TM) ROS Wrapper for D400 series and SR300 Camera
http://wiki.ros.org/RealSense
Apache License 2.0
0 stars 5 forks source link

Support for topic monitoring and cleanly shutting down or resetting driver without killing nodelet manager. #28

Closed malban closed 4 years ago

malban commented 4 years ago
130s commented 4 years ago

@abhijitmajumdar Please review (I can't assign you as a reviewer for some reason).

malban commented 4 years ago

@abhijitmajumdar I added a new commit with some fixes to allow camera hot plugging.

The docker compose needs to be modified to replace:

    volumes:
      - /dev/bus/usb:/dev/bus/usb

with:

    volumes:
      - /dev:/dev

Then it should be possible to unplug and re-plug the camera into the USB and the driver should automatically recover, or to plug in the camera after the system has started.

abhijitmajumdar commented 4 years ago

@malban review done, looks good, some minor comments, but approved otherwise. I trust your statement on replugging to work inside docker and not cause the re-enumeration problem, looking forward to see that working

malban commented 4 years ago

@malban review done, looks good, some minor comments, but approved otherwise. I trust your statement on replugging to work inside docker and not cause the re-enumeration problem, looking forward to see that working

The /dev/videoN re-enumeration still happens sometimes, but changing the docker-compose file to mount /dev allows the new enumeration to be accessible in the container. The driver doesn't care what the actual enumeration is as long as it is accessible since it gets the device by serial number not device path.

130s commented 4 years ago

I'll post a follow-up but for now I'm merging.

130s commented 4 years ago

100.223.4 is made.

130s commented 4 years ago

While running a long testing with very short timeout (set to 0.05 to 0.08), in order to forcefully trigger respawning Realsense node, buffer overflow error that causes the nodelet manager to crash was observed. Roughly speaking it happened once in 2 hours (on a computer where 2 nodes with 2 cameras are running).

With limited time, @malban and I couldn't narrowed down whether the cause is in the code itself, or it is in the test spec where driver was forced to restart very frequently. Plus One will keep an eye on it, esp. by running long running test that involves rebooting Linux (if the root cause is in the code itself, then sooner or later we should hit the same issue during long running reboot test).

More log ``` Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600242297.579211664] [/foo/baa_nodelet_manager]: insert Depth to Stereo Module#033[0m Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600242297.579246040] [/foo/baa_nodelet_manager]: insert Infrared to St*** buffer overflow detected ***: /opt/ros/kinetic/lib/nodelet/nodelet terminated Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m *** buffer overflow detected ***: /opt/ros/kinetic/lib/nodelet/nodelet terminated Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m ======= Backtrace: ========= Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m ======= Backtrace: ========= Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f854e20b7e5] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f854e20b7e5] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f854e2ad15c] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f854e2ad15c] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f854e2ab160] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f854e2ad0a7] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f854e2ab160] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/librealsense2.so.2.20(_ZN12librealsense8platform14v4l_uvc_device4pollEv+0x80)[0x7f85328d54f0] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f854e2ad0a7] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/librealsense2.so.2.20(_ZN12librealsense8platform14v4l_uvc_device12capture_loopEv+0x38)[0x7f85328d6b18] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/librealsense2.so.2.20(_ZN12librealsense8platform14v4l_uvc_device4pollEv+0x80)[0x7f85328d54f0] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd0e5e)[0x7f854e846e5e] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/librealsense2.so.2.20(_ZN12librealsense8platform14v4l_uvc_device12capture_loopEv+0x38)[0x7f85328d6b18] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f854d4f76ba] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd0e5e)[0x7f854e846e5e] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f854d4f76ba] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f854e29b41d] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m ======= Memory map: ======== Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f854e29b41d] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m ======= Memory map: ======== Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 00400000-0040e000 r-xp 00000000 08:02 10358300 /opt/ros/kinetic/lib/nodelet/nodelet Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 0060d000-0060e000 r--p 0000d000 08:02 10358300 /opt/ros/kinetic/lib/nodelet/nodelet Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 0060e000-0060f000 rw-p 0000e000 08:02 10358300 /opt/ros/kinetic/lib/nodelet/nodelet Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 0251e000-0293f000 rw-p 00000000 00:00 0 [heap] Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f849c000000-7f849df60000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f849df60000-7f84a0000000 ---p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a0000000-7f84a02a5000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a02a5000-7f84a4000000 ---p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a4000000-7f84a4ebb000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a4ebb000-7f84a8000000 ---p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a8000000-7f84a9e87000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84a9e87000-7f84ac000000 ---p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84ac000000-7f84aec9f000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84aec9f000-7f84b0000000 ---p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f84b0000000-7f84b0021000 rw-p 00000000 00:00 0 : ux-gnu/libboost_regex.so.1.58.0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c3e0000-7f854c58f000 r-xp 00000000 08:02 10231687 /usr/lib/x86_64-linux-gnu/liblog4cxx.so.10.0.0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c58f000-7f854c78e000 ---p 001af000 08:02 10231687 /usr/lib/x86_64-linux-gnu/liblog4cxx.so.10.0.0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c78e000-7f854c7b4000 r--p 001ae000 08:02 10231687 /usr/lib/x86_64-linux-gnu/liblog4cxx.so.10.0.0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c7b4000-7f854c7b5000 rw-p 001d4000 08:02 10231687 /usr/lib/x86_64-linux-gnu/liblog4cxx.so.10.0.0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c7b5000-7f854c7b7000 rw-p 00000000 00:00 0 Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c7b7000-7f854c7b8000 r-xp 00000000 08:02 10093158 /opt/ros/kinetic/lib/librosconsole_backend_interface.so Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c7b8000-7f854c9b7000 ---p 00001000 08:02 10093158 /opt/ros/kinetic/lib/librosconsole_backend_interface.so Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m 7f854c9b7000-7f854c9b8000 r--p 00000000 08:02 10093158 /opt/ros/kine#033[31m[foo/baa_nodelet_manager-2] process has died [pid 167, exit code -6, cmd /opt/ros/kinetic/lib/nodelet/nodelet manager __name:=baa_nodelet_manager __log:=/root/.ros/log/15f4ae1a-f7df-11ea-a85f-c400ad2d8caa/foo-baa_nodelet_manager-2.log]. Sep 16 02:44:57 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m log file: /root/.ros/log/15f4ae1a-f7df-11ea-a85f-c400ad2d8caa/foo-baa_nodelet_manager-2*.log#033[0m Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242298.385179] [/recovery_helper_pcl_foo]: [monitor] No new messages from topic: received in 14 seconds. System will reboot in 105 seconds !!! Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242298.385781] [/recovery_helper_pcl_foo]: Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Camera sensor has stopped generating data and may be in a unrecoverable state requiring a system reboot. Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242298.394985] [/recovery_helper_foo]: [monitor] No new messages from topic: received in 14 seconds. System will reboot in 105 seconds !!! Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242298.395286] [/recovery_helper_foo]: Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Camera sensor has stopped generating data and may be in a unrecoverable state requiring a system reboot. Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:58 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242299.392210] [/recovery_helper_pcl_foo]: [monitor] No new messages from topic: received in 15 seconds. System will reboot in 104 seconds !!! Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242299.392516] [/recovery_helper_pcl_foo]: Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Camera sensor has stopped generating data and may be in a unrecoverable state requiring a system reboot. Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242299.399156] [/recovery_helper_foo]: [monitor] No new messages from topic: received in 15 seconds. System will reboot in 104 seconds !!! Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: [WARN] [1600242299.399454] [/recovery_helper_foo]: Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Camera sensor has stopped generating data and may be in a unrecoverable state requiring a system reboot. Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:44:59 hostname_dayo por_hostos_support-start[1158]: ==== Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234819.803575443] [/foo/data_collection]: Loading nodelet /foo/data_collection of type baa_data_collection/CaptureDataNodelet to manager baa_nodelet_manager with the following remappings:#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234819.803610025] [/foo/data_collection]: /foo/data_collection/fooable_object_ai_parameters_service -> /fooable_object_ai_parameters_service#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600242300.931057631] [/foo/data_collection]: Bond broken, exiting#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844742562] [/foo/trigger]: Loading nodelet /foo/trigger of type baa_ai_foo_and_place/FooNodelet to manager baa_nodelet_manager with the following remappings:#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844779654] [/foo/trigger]: /foo/calibrate_camera -> /foo/calibration_service#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844787179] [/foo/trigger]: /foo/configure_camera -> /foo/baa_nodelet_manager/set_parameters#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844807162] [/foo/trigger]: /foo/image -> /foo/rgb/image#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844811667] [/foo/trigger]: /foo/info -> /foo/rgb/camera_info#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844815987] [/foo/trigger]: /foo/points -> /foo/depth_registered/points#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844835713] [/foo/trigger]: /foo/sensor_trigger_service -> /foo/capture#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844854480] [/foo/trigger]: /foo/trigger/background_data -> /foo/data_collection/set_background_data#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844859305] [/foo/trigger]: /foo/trigger/camera_info -> /foo/data_collection/set_camera_info#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844866088] [/foo/trigger]: /foo/trigger/live_data -> /foo/data_collection/set_live_data#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844883310] [/foo/trigger]: /foo/trigger/notify_params_changed -> /foo/data_collection/notify_params_changed#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844892903] [/foo/trigger]: /foo/trigger/parcel_classification -> /parcel_classification#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844902993] [/foo/trigger]: /foo/trigger/fooable_object_detector_service -> /fooable_object_detector_service#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844914635] [/foo/trigger]: /foo/trigger/place_verification_service -> /place_verification_service#033[0m Sep 16 02:45:01 hostname_dayo executable.py[2975]: #033[36mfooone |#033[0m #033[0m[ INFO] [1600234818.844919495] [/foo/trigger]: /foo/trigger/set_foo_result -> /foo/data_collection/set_foo_result#033[0m : ```