mjseabright commented 4 years ago

When running the OS0-64 at either 1024x20 or 2048x10 modes, IMU data is not properly published. At lidar startup there is a burst of about 20 IMU messages published, then no more. Changing the os1_proc_mask parameter to only 'IMU' does not change the behavior.

When running in 1024x10, 512x20 and 512x10 modes, IMU data is published at the correct rate (100 Hz).

When using the Ouster provided ROS1 driver, IMU data is correctly published in 1024x20 and 2048x10 modes which suggests this is not an issue on the lidar side, but on the host side.

Additional information:

Lidar is connected directly to the PC, no switches or other devices on the network.
Lidar model: OS0-64-U with v1.14.0-beta.10 firmware.
Using eloquent-devel branch of ros2_ouster_drivers

SteveMacenski commented 4 years ago

We do not technically support the OS0 sensor because Tom nor I have one to work with. We can though evaluate if we see that behavior on our OS1's and move forward from there if there's a shared problem. My guess is its a shared problem, just wanted to be up front about that.

What is the commit hash of the ROS1 driver that you're using? They did some updates recently to their UDP client library that aren't yet reflected here (#42), but if you saw it with a commit hash before that, then we should be able to fix it I imagine. If its after, then I think perhaps it would be helpful for you to port that work here to use. If you're using the newer version, then it would be great if you can see if you see that behavior with the old version. If so, then we have a clear answer of how to fix it, if not, then me and/or Tom (probably me) can look into it a bit with the hardware we have.

Let me know what you think.

mjseabright commented 4 years ago

Commit hash of the ROS1 version I am using is 68ad03c, which is the latest version. Moving back one commit to 1cf6faf, it still works just fine.

I have just tested with an OS1-64 and I see the same behavior as with the OS0 in both ROS1 and ROS2.

SteveMacenski commented 4 years ago

Moving back one commit to 1cf6faf, it still works just fine.

Works just fine meaning that you get IMU? So the issue is/was fixed by that update to the UDP client?

mjseabright commented 4 years ago

With ROS1, the IMU data works as expected on both 68ad03c and 1cf6faf. Meaning that the update to the UDP client changed nothing with respect to this issue.

mjseabright commented 4 years ago

To make it clearer:

ROS1 pre UDP client update - IMU data published as expected
ROS1 post UDP client update - IMU data published as expected
ROS2 - IMU data published as expected only at 1024x10, 512x20 and 512x10. At 1024x20 and 2048x10, IMU data not published correctly. Tested on both OS0-64 and OS1-64.

tpanzarella commented 4 years ago

I can check with an OS1-16 later today, but I am not surprised that the problematic modes are 1024x20 and 2048x10. This is basically the same amount of LiDAR data being delivered over the same amount of time. We experience, what I believe to be related, problems with the LiDAR data when running in 2048x10 mode (we need this level of resolution on the azimuth). For example this.

Changing the os1_proc_mask parameter to only 'IMU' does not change the behavior.

Not surprised by this either. Changing the os1_proc_mask parameter affects how much work is done inside of the driver at the data processors level. However, the higher frequency packet data are still delivered by the LiDAR to the driver and the packets are read from the UDP buffers. So, basically, even though you are only interested in seeing the IMU data, the LiDAR data are still being delivered, packetized in the driver, then "dropped on the floor" since there is no data processor related to LiDAR data active.

My hypothesis is that the way in which the incoming packets are handled, buffered, and processed by the driver is the root of the issue. When I observed what I saw with the 2048x10 LiDAR data as linked to above juxtapose to how how perfect it was in 1024x10 mode, I initially thought there was a flaw in the Ethernet hardware of the LiDAR itself. However, I believe that was flawed logic on my part. The LiDAR internally buffers data for 16 columns at a time then delivers those data to the host. So, regardless of 1024x10 or 2048x10, the packet sizes leaving the LiDAR are the same, it is simply the rate at which they are sent out that changes. This leads me to believe it is how it is being handled on the driver side.

Inside of the driver, there is a timer callback instantiated when the lifecycle node activates. Every time this fires, it calls this function which packetizes the bytes, buffers packets (collects a bunch of column firings until it represents a scan), parses the packets, constructs payload data structures (which in some cases involves looping over the data again, allocating memory, copying data, etc.), and publishes. I believe there are a couple of problems with this approach and what is at root here:

First, using a timer callback is a form of polling (with extra overhead). What would be preferred is a dedicated thread that reacts to incoming data on the socket buffer via a rising edge trigger. This would allow the driver to react to data on-demand as it is flowing into the host computer. This is also the lowest latency approach.

Second, all the work mentioned that is done in processData() should be split apart. The first part of it is doing what I mentioned above, separating into its own thread the handling of the incoming LiDAR packets. Additionally, I think a double buffering scheme needs to be implemented so that new incoming packets can be accumulated into a scan concurrent with the parsing, structuring, and publishing of data from the previous scan.

There are other optimizations that I would make as well, related to this, but, I don't want to get too far off topic or come off as prescriptive (not my intention). It is just, I have worked extensively with the Ousters in ROS2 under real work loads using this driver and I believe I have zoned in on the fundamental challenges.

Bottom-line, I have seen similar instabilities with the LiDAR data as you are seeing with the IMU data at the higher data rate modes. I do not believe these to be unrelated topics. I do not believe that the issue is inside the LiDAR hardware. I think the fix is in the driver and it is not simple if it is to be done correctly.

tpanzarella commented 4 years ago

I can check with an OS1-16 later today

Simple test of listening for ~20s on /imu with an OS1-16. In 1024x10 mode:

$ timeout -s INT 20s ros2 topic hz /imu
1593108692.473203 [0]       ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0
average rate: 99.922
    min: 0.010s max: 0.010s std dev: 0.00018s window: 101
average rate: 99.946
    min: 0.009s max: 0.011s std dev: 0.00021s window: 201
average rate: 99.958
    min: 0.009s max: 0.011s std dev: 0.00021s window: 301
average rate: 99.969
    min: 0.009s max: 0.011s std dev: 0.00020s window: 402
average rate: 99.976
    min: 0.009s max: 0.011s std dev: 0.00020s window: 503
average rate: 99.977
    min: 0.009s max: 0.011s std dev: 0.00020s window: 603
average rate: 99.980
    min: 0.009s max: 0.011s std dev: 0.00019s window: 703
average rate: 99.980
    min: 0.009s max: 0.011s std dev: 0.00018s window: 803
average rate: 99.980
    min: 0.001s max: 0.023s std dev: 0.00057s window: 903
average rate: 99.981
    min: 0.001s max: 0.023s std dev: 0.00054s window: 1003
average rate: 99.984
    min: 0.001s max: 0.023s std dev: 0.00052s window: 1104
average rate: 99.984
    min: 0.001s max: 0.023s std dev: 0.00050s window: 1204
average rate: 99.984
    min: 0.001s max: 0.023s std dev: 0.00048s window: 1304
average rate: 99.985
    min: 0.001s max: 0.023s std dev: 0.00047s window: 1404
average rate: 99.985
    min: 0.001s max: 0.023s std dev: 0.00045s window: 1504
average rate: 99.985
    min: 0.001s max: 0.023s std dev: 0.00044s window: 1604
average rate: 99.985
    min: 0.001s max: 0.023s std dev: 0.00043s window: 1704
average rate: 99.986
    min: 0.001s max: 0.029s std dev: 0.00067s window: 1804

Two consecutive runs, listening for ~20s, in 2048x10 mode:

$ timeout -s INT 20s ros2 topic hz /imu
1593108739.047167 [0]       ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0
average rate: 10.638
    min: 0.009s max: 1.607s std dev: 0.35654s window: 19

$ timeout -s INT 20s ros2 topic hz /imu
1593108768.243083 [0]       ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0

mjseabright commented 4 years ago

Thank you both very much for the reply's.

The behavior you see with the /imu topic on the OS1-16 is exactly the behavior I am experiencing with the OS0-64 and OS1-64.

I have just received beta ROS1 drivers from Ouster for the OS0-128 and it appears they work perfectly. I can run at 1024x20 with both IMU and point cloud data published at the expected rates (100Hz and 20 Hz respectively).

What are the differences in the implementation of the Ouster ROS1 drivers and your ROS2 drivers? Is there a fundamental design difference that causes this performance gap? It seems strange to me that the ROS1 system can handle an OSx-128 at 1024x20 while the ROS2 system cannot properly handle an OSx-64 at 1024x20. That's more than a 2x performance difference between the two systems. Don't get me wrong, I'm not trying to criticize what you guys have done, just trying to understand the limitations and potential improvements that can be made to your ROS2 system.

SteveMacenski commented 4 years ago

@tpanzarella 100% agree, I wanted to do event based rather than polling but I couldn't find anywhere in their public docs a reference to find if a new packet is available. Their ROS1 drivers polled as well without a throttle, which is a big no-no in robotics since that will essentially try to consume a core as a while true when not processing information. I went on their docs and found that the transmission rate of data as 1280hz see here in section 3.6 which is 2048x10 and 1024x20. That rate wasn't randomly set.

We could certainly buffer the data and have a separate thread process them, that would work, but I'm still wanting to avoiding a while true situation.

A trivial way to test this would be to increase the timer rate by something stupid high and see if the IMU packets come through. If they do, we know were we need to fix things.

tpanzarella commented 4 years ago

I wanted to do event based rather than polling but I couldn't find anywhere in their public docs a reference to find if a new packet is available.

The data from the LiDAR are straight UDP, so, this is done with epoll on Linux (kqueue on Mac, IOCP on Windows). More typical in a C++ application where you want platform portability is to use asio which abstracts away the above three system-specific interfaces for you. There is a header-only version of asio so, while it introduces a build dependency, there is nothing to link to at run time. Until C++23 when we get std::net, this is the way to do it in C++.

We could certainly buffer the data and have a separate thread process them, that would work, but I'm still wanting to avoiding a while true situation.

Yeah, there is no need for a while (true) { ... }. asio will give you edge-triggered event-based notifications when data are flowing into the socket and inter-thread communication would also be event-based. Using a std::condition_variable would be the typical way to handle the latter.

tpanzarella commented 4 years ago

A trivial way to test this would be to increase the timer rate by something stupid high and see if the IMU packets come through. If they do, we know were we need to fix things.

So, I double the rate at which the timer fires. Specifically:

void OusterDriver::onActivate()
{
  DataProcessorMapIt it;
  for (it = _data_processors.begin(); it != _data_processors.end(); ++it) {
    it->second->onActivate();
  }

  // speed of the Ouster lidars is 1280 hz
  _process_timer = this->create_wall_timer(
    390625ns, std::bind(&OusterDriver::processData, this));
  //781250ns, std::bind(&OusterDriver::processData, this));
}

Running in 2048x10, I get:

$ timeout -s INT 20s ros2 topic hz /points
average rate: 9.765
    min: 0.073s max: 0.133s std dev: 0.01783s window: 12
average rate: 9.863
    min: 0.073s max: 0.133s std dev: 0.01390s window: 23
average rate: 9.855
    min: 0.059s max: 0.133s std dev: 0.01752s window: 33
average rate: 9.929
    min: 0.059s max: 0.133s std dev: 0.01801s window: 44
average rate: 9.942
    min: 0.059s max: 0.133s std dev: 0.01750s window: 55
average rate: 9.950
    min: 0.059s max: 0.133s std dev: 0.01626s window: 66
average rate: 9.981
    min: 0.059s max: 0.133s std dev: 0.01584s window: 77
average rate: 9.982
    min: 0.059s max: 0.133s std dev: 0.01535s window: 87
average rate: 9.970
    min: 0.059s max: 0.133s std dev: 0.01516s window: 97
average rate: 9.970
    min: 0.059s max: 0.133s std dev: 0.01499s window: 108
average rate: 9.991
    min: 0.059s max: 0.133s std dev: 0.01536s window: 119
average rate: 9.978
    min: 0.059s max: 0.133s std dev: 0.01536s window: 129
average rate: 9.979
    min: 0.059s max: 0.133s std dev: 0.01507s window: 139
average rate: 9.981
    min: 0.059s max: 0.133s std dev: 0.01472s window: 150
average rate: 9.989
    min: 0.059s max: 0.133s std dev: 0.01472s window: 161
average rate: 9.986
    min: 0.059s max: 0.133s std dev: 0.01448s window: 171
average rate: 9.985
    min: 0.059s max: 0.133s std dev: 0.01446s window: 181

$ timeout -s INT 20s ros2 topic hz /imu
average rate: 95.722
    min: 0.000s max: 0.071s std dev: 0.01741s window: 100
average rate: 100.258
    min: 0.000s max: 0.071s std dev: 0.01658s window: 205
average rate: 99.497
    min: 0.000s max: 0.071s std dev: 0.01672s window: 303
average rate: 99.616
    min: 0.000s max: 0.071s std dev: 0.01713s window: 403
average rate: 100.167
    min: 0.000s max: 0.073s std dev: 0.01704s window: 506
average rate: 100.139
    min: 0.000s max: 0.073s std dev: 0.01723s window: 607
average rate: 100.104
    min: 0.000s max: 0.073s std dev: 0.01710s window: 707
average rate: 100.095
    min: 0.000s max: 0.075s std dev: 0.01704s window: 808
average rate: 100.085
    min: 0.000s max: 0.075s std dev: 0.01704s window: 909
average rate: 100.074
    min: 0.000s max: 0.075s std dev: 0.01706s window: 1009
average rate: 100.067
    min: 0.000s max: 0.075s std dev: 0.01712s window: 1109
average rate: 100.060
    min: 0.000s max: 0.075s std dev: 0.01714s window: 1209
average rate: 100.055
    min: 0.000s max: 0.075s std dev: 0.01717s window: 1309
average rate: 100.050
    min: 0.000s max: 0.075s std dev: 0.01713s window: 1409
average rate: 99.956
    min: 0.000s max: 0.075s std dev: 0.01717s window: 1510
average rate: 99.688
    min: 0.000s max: 0.075s std dev: 0.01720s window: 1610
average rate: 100.040
    min: 0.000s max: 0.075s std dev: 0.01712s window: 1716
average rate: 100.038
    min: 0.000s max: 0.075s std dev: 0.01717s window: 1817

Running my performance benchmarks in the same mode of operation yields the following:

Raw jitter:

raw

And a quantile plot of the same:

SteveMacenski commented 4 years ago

@tpanzarella what's the CPU load difference though? If that's not much higher, sure, lets do it. I'd much prefer to actually use the asio library (both for my personal education and also to have less latency from polling) but this is a good short measure. I was reading up last night on Boost.asio and thinking about how to change the Ouster UDP client code to accommodate.

tpanzarella commented 4 years ago

@SteveMacenski I agree. I think for now, we should jack this up. I did not do anything scientific here, I just doubled it. Let me do a little bit of analysis to look at the effect a bit more analytically. But, I think in the near term, we will want to ship something like this.

tpanzarella commented 4 years ago

Here is a look at running in 2048x10 mode with a single subscriber to /points. First, we look at both jitter and cpu usage running the timer at 1280 Hz, then we look at it at 2560 Hz.

1280 Hz

Raw jitter:

lidar-latency-03-2048x10_raw

Quantile plot:

lidar-latency-03-2048x10_q

CPU Utilization:

NOTE: The dashed orange trace here is a plot of rxKB/s on the network interface connected to the LiDAR (see right y-axis). I've plotted that time correlated to the CPU core utilization (8 cores, hyperthreaded quad core machine) so that it "frames" the relevant time window on the plot -- i.e., when the bytes were flowing in from the LiDAR to the host PC.

cpu_1280-00

Summary stats on CPU idle time:

count    1032.000000
mean       92.690930
std         5.769102
min        63.640000
25%        89.900000
50%        93.940000
75%        96.940000
max       100.000000
Name: %idle, dtype: float64

2560 Hz

Raw jitter:

lidar-latency-04-2048x10_raw

Quantile plot:

lidar-latency-04-2048x10_q

CPU Utilization:

NOTE: The dashed orange trace here is a plot of rxKB/s on the network interface connected to the LiDAR (see right y-axis). I've plotted that time correlated to the CPU core utilization (8 cores, hyperthreaded quad core machine) so that it "frames" the relevant time window on the plot -- i.e., when the bytes were flowing in from the LiDAR to the host PC.

cpu_2560-00

Summary stats on CPU idle time:

count    1056.000000
mean       92.479574
std         6.660468
min        58.760000
25%        90.620000
50%        93.940000
75%        96.970000
max       100.000000
Name: %idle, dtype: float64

SteveMacenski commented 4 years ago

Ok, that seems pretty open and closed to me that this is a great option for now.

ros-drivers / ros2_ouster_drivers

No IMU data with OS0-64 at 1024x20 or 2048x10 #55

1280 Hz

Raw jitter:

Quantile plot:

CPU Utilization:

Summary stats on CPU idle time:

2560 Hz

Raw jitter:

Quantile plot:

CPU Utilization:

Summary stats on CPU idle time: