ouster-lidar / ouster-ros

Official ROS drivers for Ouster sensors (OS0, OS1, OS2, OSDome)
https://ouster.com
Other
120 stars 144 forks source link

Loss lidar packet messages (ROS melodic) #10

Closed rdouguet closed 2 weeks ago

rdouguet commented 3 years ago

Hello,

I'm trying to record data provided by a lidar OS1-64 and by a basler insdustrial camera on a Jetson Xavier board. For that, I use the two ros packages ouster_ros and pylon_camera. These two sensors are connected via Ethernet and each camera uses a specific IP address (the thoughput on the GiE is not saturated ~ 50%).

When I use OS1-64 alone, I get :

But when I use the two sensors together, all is ok for the camera but the messages falling down to 366 for the /os_node/lidar_packet... There are clairly a conflict between this two packages and ethernet configuration but I don't understand why ? Especially since with another lidar and package (hesai), I don't have this problem.

I seen similary issue in the history github "Messages get dropped [ros melodic + industrial camera] ouster-lidar/ouster_example#225" but there is no solution.

Thanks for your help.

dmitrig commented 2 years ago

From that issue:

The issue was about processing, now solved , thanks.

That might mean the issue was CPU usage -- what does the output of top look like?

ShepelIlya commented 1 year ago

I am updated my sensors to 2.4 fw and update my drivers version to commit e4bd020. In my version of ouster driver i have little upgrades for multicast and cloud filtering.

My previous setup was on fw 2.3 with my custom code based on this commit of ouster_example.

On old setup (3 drivers for 2 os1-32 and 1 os0-64 sensors running on jetson AGX xavier) htop showed a total CPU load of 17-20 percent. On new setup (2 drivers for 2 os1-32 on same AGX xavier) total CPU load is 70-80 percent.

Why CPU usage is so high on new drivers and how we can fix it? Is it ok to use ros::Timer for connection_loop? (and what about ros::Duration(0)?)

I will try to find the reason for this behavior, but if you have any idea what the problem is, then tell me about it please.

Thanks for your help.

Samahu commented 1 year ago

@ShepelIlya We do have a polling-client and a high volume of packets (640 per seconds for lidar packets alone). It is totally okay to use ros::Timer in fact that's the method recommended by ROS specifically when using ros::nodelets in comparison to creating separate threads for background processing. I don't think the ros::Duration(0) is the culprit for high CPU usage but rather the computation involved in producing the point clouds.

We do have our plans to improve our CPU utilization but in case you already have some improvements to Ouster-ROS driver feel free to submit a PR towards the ouster-ros repo.

peci1 commented 1 year ago

You haven't mentioned that explicitly, but I assume the sensors are connected to the Jetson via an Ethernet switch. If that is the case, it is a known problem Ouster (so far) refused to try to resolve (I had a veeery long discussion with their support including suggestions for how to fix it). What you can try is using a different switch. Look at the table here https://serverfault.com/q/1098492 and choose a switch that has at least 800 Mbps in column 20 kB burst. But even then, there is no guarantee Ouster packets will not get lost.

As a workaround, as you have only 32-row lidars, you can force each of them to only use 100 Mbit link speed (e.g. by using only a 4-wire Ethernet cable, or forcing 100 Mbit links speed in switch management). This should make it much easier for Ouster packets to flow through the rest of the network. If doing this workaround, make sure you do not use the LEGACY layout of packets, otherwise they would not fit into 100 Mbps.

Samahu commented 1 year ago

@rdouguet which firmware were/are you using during your evaluation? We did deliver some performance improvements back in December, did you get to re-try this against our latest firmware and driver? I doubt it will solve your case, I haven't put any effort personally -yet- into supporting multisensors.

@peci1 Do you think this is this more of a firmware issue?

peci1 commented 1 year ago

@peci1 Do you think this is this more of a firmware issue?

The easiest to blame here is the already mentioned networking issue. And as it can be fixed in the lidar firmware, you could call it a firmware issue, yes. The fix is simple (at least from the "outside" view): do not buffer data and send them in bursts - rather send data as soon as you have them.

rdouguet commented 1 year ago

@Samahu I used a old package => the version of ouster_ros package was 0.2.1.

I fixed my problem by using a adapteur USB/Ethernet to use this lidar. In this way, the lidar is alone to communicate on this ethernet link.

Samahu commented 1 year ago

@rdouguet thanks, I will keep the ticket open for now to till I have the chance to re-produce the issue with more than one sensor.