ouster-lidar / ouster-ros

Official ROS drivers for Ouster sensors (OS0, OS1, OS2, OSDome)
https://ouster.com
Other
120 stars 143 forks source link

Point Cloud Recording Issue: Frame Skipping and Reduced Frequency During Rosbag Playback #280

Open Risbo6 opened 8 months ago

Risbo6 commented 8 months ago

Hello,

While recording a point cloud with a rosbag, I've noticed that some frames are being skipped. I'm uncertain if this issue is related to quality of service settings, as I'm still relatively new to ROS2.

When I launch the driver and check the frequency with ros2 topic hz, I observe a rate of around 10 Hz. However, during the playback of the rosbag, the frequency drops to approximately 5-6 Hz.

I'm using an OS-0-32 on ROS2 Foxy, and I've tested it on both arm and x64 architectures.

Samahu commented 8 months ago

Are you using the provided record.launch.xml launch file to capture the raw packet or are you using recording the /ouster/points(s) topics through rosbag functionality.

Julien2109 commented 8 months ago

Hi,

We tried both options, but the result is similar. You can find an example recorded with the provided record.launch.xml, and another recorded through the rosbag functionality on this google drive: https://drive.google.com/drive/folders/1CDsDUO-_0VUns5zw2WFIlMCnPdpe9jUW?usp=sharing.

Samahu commented 8 months ago

Which ROS2 distro are you targeting or testing with?

Julien2109 commented 8 months ago

We are testing with foxy, and we tried to replay the data on both foxy and humble.

Samahu commented 8 months ago

@Julien2109 I have confirmed the problem, I can see the point cloud blinking on RVIZ

image

and I did see the point cloud runs at half the frame rate. If you used the provided record.launch.xml (for the raw record) then the launch script automatically sets the proper QoS for record/replay scenario. This would eliminate the theory that the issue is due to QoS.

Do you observe this problem during live mode? Or does it only happen when it involves rosbag?

I can't produce the same problem on the computer I use for development. My intuition tells me this boils down to the underlying computer arch and dds middleware used. However you tried a different RMW?

Julien2109 commented 8 months ago

During live mode, it seems that it is not happening. I couldn't verify on RVIZ, as our target has no screen, but the topic rate was arround 10Hz in live mode as well as in recording mode. The rate is droping only when replaying the data (I tried to replay on same target and on another computer after having exported the files).

We also tried to record the data from another computer and it was working well (also with foxy). So yes, it is possible that the problem is coming from the target...

Risbo6 commented 8 months ago

We're using a jetson AGX Xavier. The issue may indeed comes from the ARM architecture. I will try to connect a laptop to the network, play the pointcloud with the jetson and record it with the laptop to confirm this.

Most of our other vehicles are still running on ROS 1, and we never had any issue with a rosbag. So far, I've only had problems with ROS 2. I hope we can find a solution.

bexcite commented 8 months ago

Hi @Risbo6,

I think you can try the ouster-cli util benchmark-sensor tool (part of Ouster SDK tools) to check how the machine (AGX in your case), receives packets, packet drops % in the packet streams and also check the CPU % loads on the core operations that Ouster SDKs is performing downstream (ScanBatching and XYZ transforms later).

NOTE: Even though the tool is Python based the non-C++ overhead is not that big, because all core parts of the process are C++ libs with python bindings.

Some steps to spin up things:

1) Install Ouster SDK with pip (Python 3.7-3.11):

pip install ouster-sdk

2) Make sure the sensor is configured properly and data is being sent to the known port to the machine

3) Check the stream of lidar packets received and it's completeness per frame:

ouster-cli util benchmark-sensor <SENSOR_HOSTNAME> --lidar-port <SENSOR_LIDAR_PORT_DEST>

4) Check the receiving + ScanBatching performance:

ouster-cli util benchmark-sensor <SENSOR_HOSTNAME> --lidar-port <SENSOR_LIDAR_PORT_DEST> --scan-batch

5) Check the receiving + ScanBatching + XYZ (--xyz)

ouster-cli util benchmark-sensor <SENSOR_HOSTNAME> --lidar-port <SENSOR_LIDAR_PORT_DEST> --xyz

The most demanding profiles are usually the higher horizontal resolution per scan and the higher frequency. (i.e. 2048x10 DUAL, or 4096x5 DUAL if your sensor supports this mode)

It's not 1-1 correspondence in what happens in ROS drivers, but the core operations and their implementations are mostly the same (ScanBatching + XYZ luts), so if the ouster-cli util benchmark-sensor on the machine that your are testing performs well, and CPU loads is not terrible (hard to say the good number here because it depends on everything else what is happening with data and all other loads and the ROS core services overhead etc, but I think smth like up to 30-50% should be fine, huge guess ....) you can be at least sure that machine perf + network configuration with the current sensor setup is OK for receiving the sensor data and storing it.

This tool was extensively used to profile Raspi 4 and I've been able to get the full stream of 2048x10 in DUAL mode without any packet loss and it helped to make 3.5x speedup of the ScanBatching ops that Ouster released in the latest updates.

Hope it helps.

Cheers,

Samahu commented 8 months ago

Since you mentioned Jetson AGX I can confirm that other users of our driver have indeed experienced some partial or complete frame drops (checkout out https://github.com/ouster-lidar/ouster-ros/issues/240). I do think it the issue applies to all ARM processor running with the ROS driver in general and is not specific to the NVIDIA Jetsons. The issue exist even with the latest updates done to the driver and needs a closer look.