ros-drivers / ros2_ouster_drivers

ROS2 Drivers for the Ouster OS-0, OS-1, and OS-2 Lidars
https://ouster.com/
Apache License 2.0
139 stars 79 forks source link

OS1 lidar outputs a single points packet instead of full PointCloud2 #63

Closed GuillaumeHauss closed 3 years ago

GuillaumeHauss commented 3 years ago

Hello, My team and I have been using 3 OS1 lidars without much issues in ROS1, until we recently decided to migrate to ROS2. We are using ROS2 Eloquent, on Ubuntu 18.04, with the default DDS implementation (Fast RTPS). When we launch the ouster node, it publishes the full point cloud for a few seconds, but afterwards it publishes what seems to be only a part of the full point cloud at a time. We suspect it to be publishing only a single packet (about 300 points), but we are not sure about this. We tried using CycloneDDS, but we have the same results. We used ros2 topic hz to get the publish rate, it was between 180 Hz and 400 Hz, when it should be 10 Hz.

Here is a gif with what we see on Rviz : VID_20201203_113527

SteveMacenski commented 3 years ago

Have you tried first with a single sensor?

GuillaumeHauss commented 3 years ago

Hi Steve, First, thank you for your quick response! Yes, we did. We tried all of them one by one and the result is the same. The first few seconds seem fine (full point cloud) and then it snaps back to this flickering point cloud. We also tested both QoS profiles (system default and sensor) and they both give the same result. Finally, we see this issue on your repo, and tried the same fix, without much more success. Do you have other tests in mind that we can perform to narrow down the plausible cause?

SteveMacenski commented 3 years ago

Not off hand, while I'm a user and supporter of the sensors, by no means am I an expert in their firmware / failure modes. I just tested on my sensor a few minutes ago without any issues. Mhm.

Any other information I should know? Non-default DDS configs, firmware versions on the sensors, or playing with source code?

I agree that since you see a lot of messages with little information, that the buffering doesn't seem to be working quite right.

GuillaumeHauss commented 3 years ago

Here are the news:

What we plan to do is:

As per your questions, we didn't mess with DDS-configs as we kept the default settings. And we didn't change the source code either. We'll keep you posted Cheers

SteveMacenski commented 3 years ago

This means that the issue lies with the configuration of the ethernet card on the prototype computer.

Exactly what I like to hear: not Steve's fault :wink:

GuillaumeHauss commented 3 years ago

Hello Steve,

We did some tests to try to figure out what's happening, here are the results :

OS type ROS version Firmware version Result
OS1-32 1 2.0 Works as intended
OS1-32 1 1.13 Works as intended
OS1-32 2 2.0 Driver crashes
OS1-32 2 1.13 Works as intended when following the ouster tutorial for the dhcp server setup.
OS1-64 1 2.0 Works as intended
OS1-64 1 1.13 Works as intended
OS1-64 2 2.0 Flashing bug (the same as we've described it in the past repliesm
OS1-64 2 1.13 Cannot test without rolling back the firmware version, but was not working last time with our network configuration

As you can see we tried multiple things:

At any rate, our flashing bug was happening already before we upgraded to the new firmware, so we're not entirely sure if it happened due to our network configuration, the firmware, or the driver (or all 3 of them). For the time being, we're going to keep using them with ROS1 and use ros1_bridge to relay the topics to ROS2. We'll try contacting ouster to downgrade the firmware and test on the OS-64 if it works as intended if we follow the tutorial for the network config.

SteveMacenski commented 3 years ago

My setup is a 64 layer on ROS2, so I think that last entry is "works". I'm not sure what my exact version is at the moment, but 1.3 sounds about right from memory (definitely not 2.0).

So it seems like there are 3 issues you're running into:

The new firmware also was only released just days ago, sorry that this hasn't yet been updated to support (even the ROS 1 drivers were only updated to enable this on Monday). I have not had the opportunity to address these changes and to be honest with year-end closing in, I'm not sure when I will have that time either. I'd appreciate the help if you're able to help fix this issue.

The os1.hpp OS1_packet.hpp and OS1_util.hpp files are near-exact carbon copies of those in the ouster_client repo that we pulled in. We made extremely surface level changes in order to enable the multiple time source features in the readme.

https://github.com/ros-drivers/ros2_ouster_drivers/pull/36/files This PR is the only PR I can find that made changes from the default client. That is supported by looking at the blames for https://github.com/ros-drivers/ros2_ouster_drivers/blame/foxy-devel/ros2_ouster/include/ros2_ouster/OS1/OS1.hpp and https://github.com/ros-drivers/ros2_ouster_drivers/blame/foxy-devel/ros2_ouster/include/ros2_ouster/OS1/OS1_util.hpp which are the only 2 that have changed since my first commit client merge (minus some reformatting from linters, which weren't functional code changes).

So it leads me to believe that as long as the general API of the client object is the same or similar, you may be able to just remove those 3 files, copy over the new ones, and check if there's any changes in the client (inside os1.hpp) functions we use in os1_sensor.hpp/.cpp to make those small updates. That should get you something that works. After, then its just a matter of translating those updates from that 1 PR to the new files (if necessary, maybe they implement that for us in 2.0) and presto! Even if you just get the copied files over and got it working, that would be a big step towards V2 support. The only hiccup I think you might run into is the lidar / imu packet sizes. In the os1 sensor we set a packet of size size to capture incoming data. If that size has changed or is dynamic, we may need to adjust that variable's size.

My initial plan when I heard about the v2 was to carefully cherry pick changes, but they lumped all the changes into a single commit that's impossible to merge in & changes so much in the ouster_client that its not just a couple of tweaks.

anderwm commented 3 years ago

The files mentioned above (os1.hpp, OS1_packet.hpp, OS1_util.hpp) don't exist in their latest code. I think the code from OS1_packet.hpp winds up in parsing.h in their new code.

SteveMacenski commented 3 years ago

Yes, you'd remove those files and replace them with the new ones

anderwm commented 3 years ago

OK, so I think that entails:

  1. Removing OS1.hpp, OS1_packet.hpp, OS1_util.hpp from includes here
  2. Adding all of the new header files from ouster
  3. Finding all the places where those things were used in your code and changing the #include or the interface to the new style

3 being the long pole in that tent...sound about right?

SteveMacenski commented 3 years ago

Yes, that would be correct. There's a small number of other edits that would have to be made to the new headers to enable the different time modes unique to this driver (that improves performance for a number of common uses) that I link to in a response above with git blame.

That's kind of the reason I made the os1_sensor abstract class, so as these APIs change or new Ouster sensors are made, the driver doesn't need to change, just the implementations of configure and such.

RFRIEDM-Trimble commented 3 years ago

I see this issue too. OS0-128 at a few different settings of lidar_mode, ROS2, and FW2.0.0. I'm going to dig into it a bit. With proc_mask of PCL|IMU, I see 7.5Hz.

RFRIEDM-Trimble commented 3 years ago

It goes away running in release mode.

SteveMacenski commented 3 years ago

@RFRIEDM-Trimble A PR to just make this always build in release mode would be appreciated. That way no one runs into this in the future.

grischi commented 3 years ago

Side-note: the crucial thing seems to be compiler optimization.

I observed the same problem like @GuillaumeHauss and was able to solve it on linux/gcc using the -O3 compiler optimization level, which is selected implicitly when choosing CMake build type Release. Still I agree that setting the default CMake build type is preferable as it is compiler-agnostic.