tier4 / nebula

Apache License 2.0
45 stars 43 forks source link

```velodyne_hw_ros_wrapper_node``` dies sometimes when launching sensors #181

Open NilaySener opened 1 month ago

NilaySener commented 1 month ago

Description

While running the Nebula driver on Leo Drive's Autonomous Test Vehicle, which is equipped with 4 Velodyne VLP-16 and 1 Velodyne VLS-128 sensors, the velodyne_hw_ros_wrapper_node dies randomly.

Expected Behavior

All LIDAR sensors (4 x Velodyne VLP-16 and 1 x Velodyne VLS-128) should publish ROS2 messages consistently and reliably during the operation of the Nebula driver on the Autonomous Test Vehicle.

Actual Behavior

After the lidars are launched, either all lidars are launched without any problems or some of the lidar component containers dies randomly. You can find some of the test output of node failures mentioned in the output below:

Complete Log Files

If you would like to examine the given scenarios in more detail, you can access the launch logs of the scenarios from the links below:

Test1

Test2

Additional Information

Please let me know if additional information is required or if there are any specific tests that should be performed to help identify the root cause of this issue.

knzo25 commented 1 month ago

@NilaySener Thanks for raising this issue.We run 1xVLS128 + 2-3 VLP16 adn currently do not face this issue.

Things that could give us insight on this issue?

As a note: I see you are using the GPS's pps right? @drwnz we do not currently use it right?

NilaySener commented 1 month ago

Hi @knzo25, Thank you for the quick response. Here are the answers to the questions you raised:


Does the issue occur when the driver itself is not in the containers?

Can you compile just the driver with debug symbols?

Does the issue occur when replaying a ROS bag or pcap file?

Regarding the GPS PPS signal usage

If there is anything I need to provide additional information about, please let me know.

drwnz commented 1 month ago

We do use PPS signals to synchronize the LiDAR, but generated from an ECU GPIO rather than from GNSS. However, we don't use GPRMC and timstamping is done from UDP packet header timestamps. Do you still get the same issue if you remove the HW monitor in the launch?

knzo25 commented 1 month ago

@NilaySener I just tried to reproduce the error with the data and launcher provided, but it works without issues on my end.

My setup:

The logs only tell us that the hw interface dies, but not really where. Since the errors can be reproduced with isolated examples (no autoware for example), I think you could try with https://github.com/pal-robotics/backward_ros to see if you can get more info for the current problem

NilaySener commented 4 weeks ago

Hi, thank you very much for your answers and suggestions.

I will remove the HW monitor from the launch file and share the results.

I also noticed that when the node dies, it only goes into the following callback once. https://github.com/tier4/nebula/blob/d9aaefc9a4c06f6dae86cd7ef22f6353f1379e4f/nebula_ros/src/velodyne/velodyne_hw_interface_ros_wrapper.cpp#L228-L235

As for the pcap file, thank you for testing it @knzo25 but I have a question:

  1. I've had to launch it twenty times or more in a row to reproduce it on the vehicle, have you had the opportunity to repeat it that many times?
  2. I wanted to use this repo to match your testing method, but I don't have permission. When I feed the .pcap file to Nebula using tcpreplay, I encountered a problem. For this reason, I cannot feed Nebula with the .pcap file right now. How can I use the repo you used for Replay?