"DROPPING PACKET" and "70% full, THROTTLING"

James-R-Han commented 3 months ago

Hello!

I am using

OS0-128
ROS2 Humble (with CycloneDDS)
UDP Profile RNG15_RFL8_NIR8
Run configuration : ros2 launch ouster_ros sensor.launch.xml sensor_hostname:=192.168.131.18 use_system_default_qos:=true timestamp_mode:=TIME_FROM_ROS_TIME sensor_qos_profile:=reliable proc_mask:="IMG|PCL" viz:=false

I will experience the following warnings:

Any advice on how to handle the situation?

Thank you in advance!

kavishmshah commented 3 months ago

Hi, I too see the same when I have multiple subscribers to the pointcloud topic. I'm using an nvidia Orin devkit and OS0-128U with 512x10, I don't see this issue occur but with 1024x10, I see packet dropa when Rviz is open and I record a bag simultaneously.

What resolution are you using?

James-R-Han commented 3 months ago

Hey Kavish! I'm glad I'm not the only one haha. I was using 1024x10 when I got the messages above

Samahu commented 3 months ago

Hi @James-R-Han and @kavishmshah thanks for sharing the feedback.

I noticed that sensor_qos_profile:=reliable which suggest that you are not using the BEST_EFFORT which is what used for SensorDataQoS. I think choosing the RELIABLE QoS for the sensor will increase the holdup on the published queue resulting in the throttling and dropped packets. I recommend that you use the default SensorDataQoS for live processing and only use RELIABLE when capturing data.

In any case, as I noted in the current merged fix https://github.com/ouster-lidar/ouster-ros/pull/321 I have further TODOs that I want to implement soon which should improve the performance on the driver side.

So stay tuned :crossed_fingers:

James-R-Han commented 3 months ago

Thanks @Samahu!

When I've tried BEST_EFFORT publisher with BEST_EFFORT subscriber (RVIZ), empirically I see the frame rate will occasionally drop well below 10Hz.

When I use RELIABLE publisher with BEST_EFFORT subscriber, sometimes the RVIZ is just blank (see below).

If I use RELIABLE for both I find the best result - the frame rate is consistently high; I'm able to wave my hand around and it's a smooth motion in the point cloud.

Samahu commented 3 months ago

This problem could be tied to the underlying RMW used. I don't see this problem on x86 platforms with CycloneDDS. I didn't measure how good the driver works on NVIDIA devices but I do intend to once I get more free time. In any case, I think that Zenoh is official now and works with ROS2 (Rolling, Iron, Jazzy, ..) which offers an alternative to the DDS communication layer and I have been hearing good reviews about it but I haven't tried myself. You can give it a try.

This is of course besides the fact that we can do more optimization on the driver side.

Limerzzz commented 3 months ago

I face the same problem, but I fix it by set the proc_mask value as PCL in driver_params.yaml. And I drive it in python.

kavishmshah commented 3 months ago

Hi @Samahu , @Limerzzz and @James-R-Han ,

With qos set to BEST_EFFORT, when parsing the ROS messages (after recording a bag file), we saw a lot of 'nan' values and decided to change it back to RELIABLE. Maybe this might explain the blank screen on RViz, not entirely sure though. Set use_system_default_qos: true, to have the system default settings, which is Reliable and Volatile. Link

We were able to resolve the packet drop issue by doing the below:

Changing the DDS to cyclone and tuning it with multicast and localhost settings.
In addition to this, we also increased the memory buffer size.
Setting MTU to 9000 also helped.

We have tried the same on a NVIDIA Devkit and a high spec PC. In both the devices, we didn't notice packet drops after the change. I'll post in the commands/settings needed in a few days once we are able to replicate on another set of PCs, just to make sure.

Thanks!

Samahu commented 3 months ago

Thanks @kavishmshah,

When you say increased the memory buffer size do you mean to increase net.core.rmem_max and net.core.rmem_default values?

kavishmshah commented 3 months ago

@Samahu yup, thats right. I believe I set them to 2GB.

Samahu commented 2 months ago

Hi @James-R-Han, I have implemented few improvements to the point cloud generation as part of #369 which I believe should help with your situation. This is partially to improving the handling of the function that generation the point cloud by allowing to skip copying the fields if the sensor didn't have valid returns for the specific pixel. Which should reduce the overhead. You can also reduce the effective range of the sensor via the two new launch file parameters min_range and max_range but this can depend on your specific use case. Additionally users who may have limited bandwidth can switch to using non-organized point cloud by setting the organized parameter to false. I do have further improvements in the pipeline but this what I'd like to merge short term. Please let me know if you were able to check that the provided solution helps (or not) with your case.

outrider-jkoch commented 2 months ago

If I'm not mistaken the driver collects a full scan and then batch processes it correct? If so, while it wouldn't save processing time could the driver instead do processing on each piece as it receives it? That way the work would split up into at most 128 (2048 / 16) small batches versus a single large batch. It might give CPUs a little bit more time to complete the required task. But if the CPU is heavily loaded this could still be problematic. Perhaps this has been tried before and not proven to be effective.

Samahu commented 2 months ago

@outrider-jkoch

If I'm not mistaken the driver collects a full scan and then batch processes it correct?

Correct, currently that how ouster-ros currently operates

If so, while it wouldn't save processing time could the driver instead do processing on each piece as it receives it?

Yeah it is possible to restructure the PointCloud composition/generation such that you merge the ScanBatching and the cartesian step in one iteration. This would be a good step to make as it equalizes the workload on every LidarPacket rather than perform the cartesian in one go once a LidarScan has been completed (the cartesian is one of the hefty operations). However, the downside to that would be you can't make LidarScan wide operations before you generate the PointCloud object from it. This is why I didn't implement this optimization per this PR as I do have examples or uses for invoking operations (such as filters) on the LidarScan before you do the cartesian.

As I mentioned the PR does have two improvements to this regard by skipping invalid range values + if non-organized point cloud is a viable option for the integrate this should reduce the overhead further due to lower bandwidth.

Samahu commented 2 months ago

Regarding the cartesian being hefty operation I do have some planned optimizations that should significantly reduce its overhead but these will have to wait until the next or later release.

Samahu commented 1 month ago

@James-R-Han Could you please try out the last release and let me know if it helps with your situation. Thanks.

James-R-Han commented 1 month ago

Hi @Samahu! Thanks for implementing those improvements! Shortly after my post, I swapped to a better computer (better CPU) and the problem went away. So, I won't be able to explicitly recreate testing conditions when I first raised this post. Sorry about that!

Samahu commented 1 month ago

Thanks, I do have a low end computer that exposed the same problem I will see if I can still re-produce before and after the change.

ouster-lidar / ouster-ros

"DROPPING PACKET" and "70% full, THROTTLING" #352