ouster-lidar / ouster-ros

Official ROS drivers for Ouster sensors (OS0, OS1, OS2, OSDome)
https://ouster.com
Other
116 stars 139 forks source link

"DROPPING PACKET" and "70% full, THROTTLING" #352

Open James-R-Han opened 1 month ago

James-R-Han commented 1 month ago

Hello!

I am using

I will experience the following warnings: image

Any advice on how to handle the situation?

Thank you in advance!

kavishmshah commented 1 month ago

Hi, I too see the same when I have multiple subscribers to the pointcloud topic. I'm using an nvidia Orin devkit and OS0-128U with 512x10, I don't see this issue occur but with 1024x10, I see packet dropa when Rviz is open and I record a bag simultaneously.

What resolution are you using?

James-R-Han commented 1 month ago

Hey Kavish! I'm glad I'm not the only one haha. I was using 1024x10 when I got the messages above

Samahu commented 1 month ago

Hi @James-R-Han and @kavishmshah thanks for sharing the feedback.

I noticed that sensor_qos_profile:=reliable which suggest that you are not using the BEST_EFFORT which is what used for SensorDataQoS. I think choosing the RELIABLE QoS for the sensor will increase the holdup on the published queue resulting in the throttling and dropped packets. I recommend that you use the default SensorDataQoS for live processing and only use RELIABLE when capturing data.

In any case, as I noted in the current merged fix https://github.com/ouster-lidar/ouster-ros/pull/321 I have further TODOs that I want to implement soon which should improve the performance on the driver side.

So stay tuned :crossed_fingers:

James-R-Han commented 1 month ago

Thanks @Samahu!

When I've tried BEST_EFFORT publisher with BEST_EFFORT subscriber (RVIZ), empirically I see the frame rate will occasionally drop well below 10Hz.

When I use RELIABLE publisher with BEST_EFFORT subscriber, sometimes the RVIZ is just blank (see below).

If I use RELIABLE for both I find the best result - the frame rate is consistently high; I'm able to wave my hand around and it's a smooth motion in the point cloud.

image

Samahu commented 1 month ago

This problem could be tied to the underlying RMW used. I don't see this problem on x86 platforms with CycloneDDS. I didn't measure how good the driver works on NVIDIA devices but I do intend to once I get more free time. In any case, I think that Zenoh is official now and works with ROS2 (Rolling, Iron, Jazzy, ..) which offers an alternative to the DDS communication layer and I have been hearing good reviews about it but I haven't tried myself. You can give it a try.

This is of course besides the fact that we can do more optimization on the driver side.

Limerzzz commented 3 weeks ago

I face the same problem, but I fix it by set the proc_mask value as PCL in driver_params.yaml. And I drive it in python.

kavishmshah commented 3 weeks ago

Hi @Samahu , @Limerzzz and @James-R-Han ,

With qos set to BEST_EFFORT, when parsing the ROS messages (after recording a bag file), we saw a lot of 'nan' values and decided to change it back to RELIABLE. Maybe this might explain the blank screen on RViz, not entirely sure though. Set use_system_default_qos: true, to have the system default settings, which is Reliable and Volatile. Link

We were able to resolve the packet drop issue by doing the below:

  1. Changing the DDS to cyclone and tuning it with multicast and localhost settings.
  2. In addition to this, we also increased the memory buffer size.
  3. Setting MTU to 9000 also helped.

We have tried the same on a NVIDIA Devkit and a high spec PC. In both the devices, we didn't notice packet drops after the change. I'll post in the commands/settings needed in a few days once we are able to replicate on another set of PCs, just to make sure.

Thanks!

Samahu commented 3 weeks ago

Thanks @kavishmshah,

When you say increased the memory buffer size do you mean to increase net.core.rmem_max and net.core.rmem_default values?

kavishmshah commented 3 weeks ago

@Samahu yup, thats right. I believe I set them to 2GB.