Closed mjseabright closed 4 years ago
We do not technically support the OS0 sensor because Tom nor I have one to work with. We can though evaluate if we see that behavior on our OS1's and move forward from there if there's a shared problem. My guess is its a shared problem, just wanted to be up front about that.
What is the commit hash of the ROS1 driver that you're using? They did some updates recently to their UDP client library that aren't yet reflected here (#42), but if you saw it with a commit hash before that, then we should be able to fix it I imagine. If its after, then I think perhaps it would be helpful for you to port that work here to use. If you're using the newer version, then it would be great if you can see if you see that behavior with the old version. If so, then we have a clear answer of how to fix it, if not, then me and/or Tom (probably me) can look into it a bit with the hardware we have.
Let me know what you think.
Commit hash of the ROS1 version I am using is 68ad03c, which is the latest version. Moving back one commit to 1cf6faf, it still works just fine.
I have just tested with an OS1-64 and I see the same behavior as with the OS0 in both ROS1 and ROS2.
Moving back one commit to 1cf6faf, it still works just fine.
Works just fine meaning that you get IMU? So the issue is/was fixed by that update to the UDP client?
With ROS1, the IMU data works as expected on both 68ad03c and 1cf6faf. Meaning that the update to the UDP client changed nothing with respect to this issue.
To make it clearer:
I can check with an OS1-16 later today, but I am not surprised that the problematic modes are 1024x20
and 2048x10
. This is basically the same amount of LiDAR data being delivered over the same amount of time. We experience, what I believe to be related, problems with the LiDAR data when running in 2048x10
mode (we need this level of resolution on the azimuth). For example this.
Changing the os1_proc_mask parameter to only 'IMU' does not change the behavior.
Not surprised by this either. Changing the os1_proc_mask
parameter affects how much work is done inside of the driver at the data processors level. However, the higher frequency packet data are still delivered by the LiDAR to the driver and the packets are read from the UDP buffers. So, basically, even though you are only interested in seeing the IMU data, the LiDAR data are still being delivered, packetized in the driver, then "dropped on the floor" since there is no data processor related to LiDAR data active.
My hypothesis is that the way in which the incoming packets are handled, buffered, and processed by the driver is the root of the issue. When I observed what I saw with the 2048x10
LiDAR data as linked to above juxtapose to how how perfect it was in 1024x10
mode, I initially thought there was a flaw in the Ethernet hardware of the LiDAR itself. However, I believe that was flawed logic on my part. The LiDAR internally buffers data for 16 columns at a time then delivers those data to the host. So, regardless of 1024x10
or 2048x10
, the packet sizes leaving the LiDAR are the same, it is simply the rate at which they are sent out that changes. This leads me to believe it is how it is being handled on the driver side.
Inside of the driver, there is a timer callback instantiated when the lifecycle node activates
. Every time this fires, it calls this function which packetizes the bytes, buffers packets (collects a bunch of column firings until it represents a scan), parses the packets, constructs payload data structures (which in some cases involves looping over the data again, allocating memory, copying data, etc.), and publishes. I believe there are a couple of problems with this approach and what is at root here:
First, using a timer callback is a form of polling (with extra overhead). What would be preferred is a dedicated thread that reacts to incoming data on the socket buffer via a rising edge trigger. This would allow the driver to react to data on-demand as it is flowing into the host computer. This is also the lowest latency approach.
Second, all the work mentioned that is done in processData()
should be split apart. The first part of it is doing what I mentioned above, separating into its own thread the handling of the incoming LiDAR packets. Additionally, I think a double buffering scheme needs to be implemented so that new incoming packets can be accumulated into a scan concurrent with the parsing, structuring, and publishing of data from the previous scan.
There are other optimizations that I would make as well, related to this, but, I don't want to get too far off topic or come off as prescriptive (not my intention). It is just, I have worked extensively with the Ousters in ROS2 under real work loads using this driver and I believe I have zoned in on the fundamental challenges.
Bottom-line, I have seen similar instabilities with the LiDAR data as you are seeing with the IMU data at the higher data rate modes. I do not believe these to be unrelated topics. I do not believe that the issue is inside the LiDAR hardware. I think the fix is in the driver and it is not simple if it is to be done correctly.
I can check with an OS1-16 later today
Simple test of listening for ~20s on /imu
with an OS1-16. In 1024x10
mode:
$ timeout -s INT 20s ros2 topic hz /imu
1593108692.473203 [0] ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0
average rate: 99.922
min: 0.010s max: 0.010s std dev: 0.00018s window: 101
average rate: 99.946
min: 0.009s max: 0.011s std dev: 0.00021s window: 201
average rate: 99.958
min: 0.009s max: 0.011s std dev: 0.00021s window: 301
average rate: 99.969
min: 0.009s max: 0.011s std dev: 0.00020s window: 402
average rate: 99.976
min: 0.009s max: 0.011s std dev: 0.00020s window: 503
average rate: 99.977
min: 0.009s max: 0.011s std dev: 0.00020s window: 603
average rate: 99.980
min: 0.009s max: 0.011s std dev: 0.00019s window: 703
average rate: 99.980
min: 0.009s max: 0.011s std dev: 0.00018s window: 803
average rate: 99.980
min: 0.001s max: 0.023s std dev: 0.00057s window: 903
average rate: 99.981
min: 0.001s max: 0.023s std dev: 0.00054s window: 1003
average rate: 99.984
min: 0.001s max: 0.023s std dev: 0.00052s window: 1104
average rate: 99.984
min: 0.001s max: 0.023s std dev: 0.00050s window: 1204
average rate: 99.984
min: 0.001s max: 0.023s std dev: 0.00048s window: 1304
average rate: 99.985
min: 0.001s max: 0.023s std dev: 0.00047s window: 1404
average rate: 99.985
min: 0.001s max: 0.023s std dev: 0.00045s window: 1504
average rate: 99.985
min: 0.001s max: 0.023s std dev: 0.00044s window: 1604
average rate: 99.985
min: 0.001s max: 0.023s std dev: 0.00043s window: 1704
average rate: 99.986
min: 0.001s max: 0.029s std dev: 0.00067s window: 1804
Two consecutive runs, listening for ~20s, in 2048x10
mode:
$ timeout -s INT 20s ros2 topic hz /imu
1593108739.047167 [0] ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0
average rate: 10.638
min: 0.009s max: 1.607s std dev: 0.35654s window: 19
$ timeout -s INT 20s ros2 topic hz /imu
1593108768.243083 [0] ros2: using network interface enp0s31f6 (udp/192.168.0.92) selected arbitrarily from: enp0s31f6, wlp3s0
Thank you both very much for the reply's.
The behavior you see with the /imu topic on the OS1-16 is exactly the behavior I am experiencing with the OS0-64 and OS1-64.
I have just received beta ROS1 drivers from Ouster for the OS0-128 and it appears they work perfectly. I can run at 1024x20 with both IMU and point cloud data published at the expected rates (100Hz and 20 Hz respectively).
What are the differences in the implementation of the Ouster ROS1 drivers and your ROS2 drivers? Is there a fundamental design difference that causes this performance gap? It seems strange to me that the ROS1 system can handle an OSx-128 at 1024x20 while the ROS2 system cannot properly handle an OSx-64 at 1024x20. That's more than a 2x performance difference between the two systems. Don't get me wrong, I'm not trying to criticize what you guys have done, just trying to understand the limitations and potential improvements that can be made to your ROS2 system.
@tpanzarella 100% agree, I wanted to do event based rather than polling but I couldn't find anywhere in their public docs a reference to find if a new packet is available. Their ROS1 drivers polled as well without a throttle, which is a big no-no in robotics since that will essentially try to consume a core as a while true
when not processing information. I went on their docs and found that the transmission rate of data as 1280hz see here in section 3.6 which is 2048x10
and 1024x20
. That rate wasn't randomly set.
We could certainly buffer the data and have a separate thread process them, that would work, but I'm still wanting to avoiding a while true
situation.
A trivial way to test this would be to increase the timer rate by something stupid high and see if the IMU packets come through. If they do, we know were we need to fix things.
I wanted to do event based rather than polling but I couldn't find anywhere in their public docs a reference to find if a new packet is available.
The data from the LiDAR are straight UDP, so, this is done with epoll
on Linux (kqueue
on Mac, IOCP
on Windows). More typical in a C++ application where you want platform portability is to use asio
which abstracts away the above three system-specific interfaces for you. There is a header-only version of asio
so, while it introduces a build dependency, there is nothing to link to at run time. Until C++23 when we get std::net
, this is the way to do it in C++.
We could certainly buffer the data and have a separate thread process them, that would work, but I'm still wanting to avoiding a while true situation.
Yeah, there is no need for a while (true) { ... }
. asio
will give you edge-triggered event-based notifications when data are flowing into the socket and inter-thread communication would also be event-based. Using a std::condition_variable
would be the typical way to handle the latter.
A trivial way to test this would be to increase the timer rate by something stupid high and see if the IMU packets come through. If they do, we know were we need to fix things.
So, I double the rate at which the timer fires. Specifically:
void OusterDriver::onActivate()
{
DataProcessorMapIt it;
for (it = _data_processors.begin(); it != _data_processors.end(); ++it) {
it->second->onActivate();
}
// speed of the Ouster lidars is 1280 hz
_process_timer = this->create_wall_timer(
390625ns, std::bind(&OusterDriver::processData, this));
//781250ns, std::bind(&OusterDriver::processData, this));
}
Running in 2048x10
, I get:
$ timeout -s INT 20s ros2 topic hz /points
average rate: 9.765
min: 0.073s max: 0.133s std dev: 0.01783s window: 12
average rate: 9.863
min: 0.073s max: 0.133s std dev: 0.01390s window: 23
average rate: 9.855
min: 0.059s max: 0.133s std dev: 0.01752s window: 33
average rate: 9.929
min: 0.059s max: 0.133s std dev: 0.01801s window: 44
average rate: 9.942
min: 0.059s max: 0.133s std dev: 0.01750s window: 55
average rate: 9.950
min: 0.059s max: 0.133s std dev: 0.01626s window: 66
average rate: 9.981
min: 0.059s max: 0.133s std dev: 0.01584s window: 77
average rate: 9.982
min: 0.059s max: 0.133s std dev: 0.01535s window: 87
average rate: 9.970
min: 0.059s max: 0.133s std dev: 0.01516s window: 97
average rate: 9.970
min: 0.059s max: 0.133s std dev: 0.01499s window: 108
average rate: 9.991
min: 0.059s max: 0.133s std dev: 0.01536s window: 119
average rate: 9.978
min: 0.059s max: 0.133s std dev: 0.01536s window: 129
average rate: 9.979
min: 0.059s max: 0.133s std dev: 0.01507s window: 139
average rate: 9.981
min: 0.059s max: 0.133s std dev: 0.01472s window: 150
average rate: 9.989
min: 0.059s max: 0.133s std dev: 0.01472s window: 161
average rate: 9.986
min: 0.059s max: 0.133s std dev: 0.01448s window: 171
average rate: 9.985
min: 0.059s max: 0.133s std dev: 0.01446s window: 181
$ timeout -s INT 20s ros2 topic hz /imu
average rate: 95.722
min: 0.000s max: 0.071s std dev: 0.01741s window: 100
average rate: 100.258
min: 0.000s max: 0.071s std dev: 0.01658s window: 205
average rate: 99.497
min: 0.000s max: 0.071s std dev: 0.01672s window: 303
average rate: 99.616
min: 0.000s max: 0.071s std dev: 0.01713s window: 403
average rate: 100.167
min: 0.000s max: 0.073s std dev: 0.01704s window: 506
average rate: 100.139
min: 0.000s max: 0.073s std dev: 0.01723s window: 607
average rate: 100.104
min: 0.000s max: 0.073s std dev: 0.01710s window: 707
average rate: 100.095
min: 0.000s max: 0.075s std dev: 0.01704s window: 808
average rate: 100.085
min: 0.000s max: 0.075s std dev: 0.01704s window: 909
average rate: 100.074
min: 0.000s max: 0.075s std dev: 0.01706s window: 1009
average rate: 100.067
min: 0.000s max: 0.075s std dev: 0.01712s window: 1109
average rate: 100.060
min: 0.000s max: 0.075s std dev: 0.01714s window: 1209
average rate: 100.055
min: 0.000s max: 0.075s std dev: 0.01717s window: 1309
average rate: 100.050
min: 0.000s max: 0.075s std dev: 0.01713s window: 1409
average rate: 99.956
min: 0.000s max: 0.075s std dev: 0.01717s window: 1510
average rate: 99.688
min: 0.000s max: 0.075s std dev: 0.01720s window: 1610
average rate: 100.040
min: 0.000s max: 0.075s std dev: 0.01712s window: 1716
average rate: 100.038
min: 0.000s max: 0.075s std dev: 0.01717s window: 1817
Running my performance benchmarks in the same mode of operation yields the following:
Raw jitter:
And a quantile plot of the same:
@tpanzarella what's the CPU load difference though? If that's not much higher, sure, lets do it. I'd much prefer to actually use the asio
library (both for my personal education and also to have less latency from polling) but this is a good short measure. I was reading up last night on Boost.asio
and thinking about how to change the Ouster UDP client code to accommodate.
@SteveMacenski I agree. I think for now, we should jack this up. I did not do anything scientific here, I just doubled it. Let me do a little bit of analysis to look at the effect a bit more analytically. But, I think in the near term, we will want to ship something like this.
Here is a look at running in 2048x10
mode with a single subscriber to /points
. First, we look at both jitter and cpu usage running the timer at 1280 Hz, then we look at it at 2560 Hz.
NOTE: The dashed orange trace here is a plot of rxKB/s on the network interface connected to the LiDAR (see right y-axis). I've plotted that time correlated to the CPU core utilization (8 cores, hyperthreaded quad core machine) so that it "frames" the relevant time window on the plot -- i.e., when the bytes were flowing in from the LiDAR to the host PC.
count 1032.000000
mean 92.690930
std 5.769102
min 63.640000
25% 89.900000
50% 93.940000
75% 96.940000
max 100.000000
Name: %idle, dtype: float64
NOTE: The dashed orange trace here is a plot of rxKB/s on the network interface connected to the LiDAR (see right y-axis). I've plotted that time correlated to the CPU core utilization (8 cores, hyperthreaded quad core machine) so that it "frames" the relevant time window on the plot -- i.e., when the bytes were flowing in from the LiDAR to the host PC.
count 1056.000000
mean 92.479574
std 6.660468
min 58.760000
25% 90.620000
50% 93.940000
75% 96.970000
max 100.000000
Name: %idle, dtype: float64
Ok, that seems pretty open and closed to me that this is a great option for now.
When running the OS0-64 at either 1024x20 or 2048x10 modes, IMU data is not properly published. At lidar startup there is a burst of about 20 IMU messages published, then no more. Changing the os1_proc_mask parameter to only 'IMU' does not change the behavior.
When running in 1024x10, 512x20 and 512x10 modes, IMU data is published at the correct rate (100 Hz).
When using the Ouster provided ROS1 driver, IMU data is correctly published in 1024x20 and 2048x10 modes which suggests this is not an issue on the lidar side, but on the host side.
Additional information: