ouster-lidar / ouster-ros

Official ROS drivers for Ouster sensors (OS0, OS1, OS2, OSDome)
https://ouster.com
Other
124 stars 146 forks source link

Segmentation fault in os_cloud_nodelet.cpp ? #42

Closed clegenti closed 1 year ago

clegenti commented 1 year ago

Hi,

I am trying to run the driver with a OS1-16 (FW 2.4) lidar (Part number 840-101855-02): roslaunch ouster_ros sensor.launch sensor_hostname:=os1-99XXXX.local

Unfortunately the driver crashes: /opt/ros/noetic/lib/nodelet/nodelet: line 1: 44277 Segmentation fault (core dumped) $0 $@

I tried with another OS1-16 (FW 2.0), same problem. When I use the OusterStudio, I can see the data and everything seems to work well.

If I remove the cloud nodelet from common.launch (removing line 9 to 17), I can see the topics /ouster/lidar_packets and /ouster/imu_packets at 640Hz and 100Hz, respectively.

Putting back the nodelet in the launch file, and inserting prints in the code, it seems that the error is coming from the call to scan_to_cloud in os_cloud_nodelet.cpp (see below for the code modifications and the whole print)

Do you know what could cause this and how to solve it ?

Thanks Cedric

Code modification for prints

os_cloud_nodelet.cpp from line 107

void convert_scan_to_pointcloud_publish(std::chrono::nanoseconds scan_ts, const ros::Time& msg_ts) {
      for (int i = 0; i < n_returns; ++i) {
          NODELET_INFO("BEFORE SCAN TO CLOUD");
          scan_to_cloud(xyz_lut, scan_ts, ls, cloud, i); 
          NODELET_INFO("AFTER SCAN TO CLOUD");
          sensor_msgs::PointCloud2 pc =
              ouster_ros::cloud_to_cloud_msg(cloud, msg_ts, sensor_frame);

os_ros.cpp from line 127

  void scan_to_cloud(const ouster::XYZLut& xyz_lut,
                     ouster::LidarScan::ts_t scan_ts, const ouster::LidarScan& ls,
                     ouster_ros::Cloud& cloud, int return_index) {
      std::cout << "START SCAN TO CLOUD" << std::endl;
      bool second = (return_index == 1);
      cloud.resize(ls.w * ls.h);

      ouster::img_t<uint16_t> near_ir = get_or_fill_zero<uint16_t>(
          suitable_return(sensor::ChanField::NEAR_IR, second), ls);

      ouster::img_t<uint32_t> range = get_or_fill_zero<uint32_t>(
          suitable_return(sensor::ChanField::RANGE, second), ls);

      ouster::img_t<uint32_t> signal = get_or_fill_zero<uint32_t>(
          suitable_return(sensor::ChanField::SIGNAL, second), ls);

      ouster::img_t<uint16_t> reflectivity = get_or_fill_zero<uint16_t>(
          suitable_return(sensor::ChanField::REFLECTIVITY, second), ls);

      auto points = ouster::cartesian(range, xyz_lut);
      auto timestamp = ls.timestamp();

      for (auto u = 0; u < ls.h; u++) {
          for (auto v = 0; v < ls.w; v++) {
              const auto xyz = points.row(u * ls.w + v);
              const auto ts =
                  (std::chrono::nanoseconds(timestamp[v]) - scan_ts).count();
              cloud(v, u) = ouster_ros::Point{
                  {{static_cast<float>(xyz(0)), static_cast<float>(xyz(1)),
                    static_cast<float>(xyz(2)), 1.0f}},
                  static_cast<float>(signal(u, v)),
                  static_cast<uint32_t>(ts),
                  static_cast<uint16_t>(reflectivity(u, v)),
                  static_cast<uint8_t>(u),
                  static_cast<uint16_t>(near_ir(u, v)),
                  static_cast<uint32_t>(range(u, v))};
          }
      }
      std::cout << "END SCAN TO CLOUD" << std::endl;
  }

Print

ced@ced-notebook:~/catkin_ws$ roslaunch ouster_ros sensor.launch sensor_hostname:=os1-991924000182.local
... logging to /home/ced/.ros/log/da7c9372-7b47-11ed-81e5-af45f6f59e16/roslaunch-ced-notebook-46887.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://ced-notebook:36061/

SUMMARY
========

PARAMETERS
 * /ouster/os_cloud_node/tf_prefix: 
 * /ouster/os_cloud_node/timestamp_mode: 
 * /ouster/os_node/imu_port: 0
 * /ouster/os_node/lidar_mode: 
 * /ouster/os_node/lidar_port: 0
 * /ouster/os_node/metadata: 
 * /ouster/os_node/sensor_hostname: os1-991924000182....
 * /ouster/os_node/timestamp_mode: 
 * /ouster/os_node/udp_dest: 
 * /ouster/os_node/udp_profile_lidar: 
 * /rosdistro: noetic
 * /rosversion: 1.15.15

NODES
  /
    relay_imu (topic_tools/relay)
    relay_lidar (topic_tools/relay)
    rviz (rviz/rviz)
  /ouster/
    img_node (nodelet/nodelet)
    os_cloud_node (nodelet/nodelet)
    os_node (nodelet/nodelet)
    os_nodelet_mgr (nodelet/nodelet)

ROS_MASTER_URI=http://localhost:11311

process[ouster/os_nodelet_mgr-1]: started with pid [46901]
process[ouster/os_node-2]: started with pid [46903]
process[ouster/os_cloud_node-3]: started with pid [46905]
process[ouster/img_node-4]: started with pid [46907]
process[rviz-5]: started with pid [46909]
process[relay_lidar-6]: started with pid [46911]
process[relay_imu-7]: started with pid [46912]
[ INFO] [1671004161.300918715]: Initializing nodelet with 16 worker threads.
[ INFO] [1671004162.292009813]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:
[ WARN] [1671004162.322284296]: lidar port set to zero, the client will assign a random port number!
[ WARN] [1671004162.322318282]: imu port set to zero, the client will assign a random port number!
[ INFO] [1671004162.322330235]: Will use automatic UDP destination
[ WARN] [1671004162.523428897]: Sensor os1-991924000182.local configured successfully
[ INFO] [1671004162.523496398]: Starting sensor os1-991924000182.local initialization...
[2022-12-14 18:49:22.523] [ouster::sensor] [info] initializing sensor: os1-991924000182.local with ports: 0/0
[ INFO] [1671004163.304245202]: Loading nodelet /ouster/os_cloud_node of type nodelets_os/OusterCloud to manager os_nodelet_mgr with the following remappings:
[ INFO] [1671004163.315182665]: Loading nodelet /ouster/img_node of type nodelets_os/OusterImage to manager os_nodelet_mgr with the following remappings:
[ INFO] [1671004164.462140984]: rviz version 1.14.19
[ INFO] [1671004164.462257601]: compiled against Qt version 5.12.8
[ INFO] [1671004164.462290375]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1671004164.487389621]: Forcing OpenGl version 0.
[ INFO] [1671004164.834589265]: Stereo is NOT SUPPORTED
[ INFO] [1671004164.834710320]: OpenGL device: AMD RENOIR (DRM 3.42.0, 5.15.0-56-generic, LLVM 12.0.0)
[ INFO] [1671004164.834747552]: OpenGl version: 4.6 (GLSL 4.6) limited to GLSL 1.4 on Mesa system.
[ WARN] [1671004174.663111320]: Client version: 0.7.1b1+279c9fb-release
[ WARN] [1671004174.663188780]: Using lidar_mode: 1024x10
[ WARN] [1671004174.663212827]: OS-1-16-B13 sn: 991924000182 firmware rev: v2.4.0
[ WARN] [1671004174.664660523]: No metadata file was specified, using: os1-991924000182-metadata.json
[ INFO] [1671004174.671251261]: Wrote metadata to os1-991924000182-metadata.json
[ INFO] [1671004174.672399154]: get_metadata service created
[ INFO] [1671004174.673270749]: get_config service created
[ INFO] [1671004174.674070484]: set_config service created
[ INFO] [1671004174.686210956]: OusterCloud: retrieved sensor metadata!
[ INFO] [1671004174.686743341]: Profile has 1 return(s)
[ INFO] [1671004174.703401891]: OusterImage: retrieved sensor metadata!
[ INFO] [1671004174.779948376]: BEFORE SCAN TO CLOUD
START SCAN TO CLOUD
END SCAN TO CLOUD
/opt/ros/noetic/lib/nodelet/nodelet: line 1: 46927 Segmentation fault      (core dumped) $0 $@
================================================================================REQUIRED process [ouster/os_nodelet_mgr-1] has died!
process has died [pid 46901, exit code 139, cmd bash -c sleep 2; $0 $@ /opt/ros/noetic/lib/nodelet/nodelet manager __name:=os_nodelet_mgr __log:=/home/ced/.ros/log/da7c9372-7b47-11ed-81e5-af45f6f59e16/ouster-os_nodelet_mgr-1.log].
log file: /home/ced/.ros/log/da7c9372-7b47-11ed-81e5-af45f6f59e16/ouster-os_nodelet_mgr-1*.log
Initiating shutdown!
================================================================================
[relay_imu-7] killing on exit
[relay_lidar-6] killing on exit
[rviz-5] killing on exit
[ouster/img_node-4] killing on exit
[ouster/os_cloud_node-3] killing on exit
[ouster/os_node-2] killing on exit
[ouster/os_nodelet_mgr-1] killing on exit
[ INFO] [1671004176.145572739]: Unloading nodelet /ouster/img_node from manager os_nodelet_mgr
[ INFO] [1671004176.145744433]: Unloading nodelet /ouster/os_cloud_node from manager os_nodelet_mgr
[ INFO] [1671004176.145951796]: Unloading nodelet /ouster/os_node from manager os_nodelet_mgr
[ INFO] [1671004176.146997671]: waitForService: Service [/ouster/os_nodelet_mgr/unload_nodelet] could not connect to host [ced-notebook:43405], waiting...
[ WARN] [1671004176.147052026]: Couldn't find service os_nodelet_mgr/unload_nodelet, perhaps the manager is already shut down
[ INFO] [1671004176.147513684]: waitForService: Service [/ouster/os_nodelet_mgr/unload_nodelet] could not connect to host [ced-notebook:43405], waiting...
[ WARN] [1671004176.147569322]: Couldn't find service os_nodelet_mgr/unload_nodelet, perhaps the manager is already shut down
[ INFO] [1671004176.147890167]: waitForService: Service [/ouster/os_nodelet_mgr/unload_nodelet] could not connect to host [ced-notebook:43405], waiting...
[ WARN] [1671004176.147934222]: Couldn't find service os_nodelet_mgr/unload_nodelet, perhaps the manager is already shut down
shutting down processing monitor...
... shutting down processing monitor complete
done
Samahu commented 1 year ago

Hi @clegenti, thanks for posting this. Unfortunately, this is the first time a user reported this and I need to understand the problem better before we can assist you so I have the following asks:

Ussama

clegenti commented 1 year ago

Hi Ussama, Thanks for the quick answer.

If I comment out the declaration of the lidar subscriber lidar_packet_sub (lines 102 and 103 of os_cloud_nodelet.cpp) the rest of the nodelet works fine (rostopic echo /ouster/imu shows good data).

I am running Noetic on Ubuntu 20.04.

Thanks Cedric

Samahu commented 1 year ago

That is interesting but we need more information to be able to assist you.

What type of machine are you using? How much RAM does this machine have? Where does the method fails exactly (Is it during acquiring the images, during the cartesian method, or when the results are copied to the point cloud, can you provide us with that).

We have also pushed changes today which replace this method with a different one, would you consider upgrading and see if the problem still persists?

clegenti commented 1 year ago

I work with a HP EliteBook laptop (AMD® Ryzen 7 pro 4750u, 32GB RAM). As weird as it can sound. The method scan_to_cloud seems to fail on the exit (the prints at the end of the method is shown).

I pulled and compiled the last version, now I get an error loading the nodelets:

ced@ced-notebook:~/catkin_ws$ roslaunch ouster_ros sensor.launch sensor_hostname:=os1-991924000182.local
... logging to /home/ced/.ros/log/37db9ca8-7cc8-11ed-b01c-555216779564/roslaunch-ced-notebook-39617.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://ced-notebook:46501/

SUMMARY
========

PARAMETERS
 * /ouster/os_cloud_node/tf_prefix: 
 * /ouster/os_cloud_node/timestamp_mode: 
 * /ouster/os_node/imu_port: 0
 * /ouster/os_node/lidar_mode: 
 * /ouster/os_node/lidar_port: 0
 * /ouster/os_node/metadata: 
 * /ouster/os_node/sensor_hostname: os1-991924000182....
 * /ouster/os_node/timestamp_mode: 
 * /ouster/os_node/udp_dest: 
 * /ouster/os_node/udp_profile_lidar: 
 * /rosdistro: noetic
 * /rosversion: 1.15.15

NODES
  /
    relay_imu (topic_tools/relay)
    relay_lidar (topic_tools/relay)
    rviz (rviz/rviz)
  /ouster/
    img_node (nodelet/nodelet)
    os_cloud_node (nodelet/nodelet)
    os_node (nodelet/nodelet)
    os_nodelet_mgr (nodelet/nodelet)

auto-starting new master
process[master]: started with pid [39625]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 37db9ca8-7cc8-11ed-b01c-555216779564
process[rosout-1]: started with pid [39635]
started core service [/rosout]
process[ouster/os_nodelet_mgr-2]: started with pid [39642]
process[ouster/os_node-3]: started with pid [39644]
process[ouster/os_cloud_node-4]: started with pid [39646]
process[ouster/img_node-5]: started with pid [39648]
process[rviz-6]: started with pid [39650]
process[relay_lidar-7]: started with pid [39652]
process[relay_imu-8]: started with pid [39653]
[ INFO] [1671143491.620263014]: Initializing nodelet with 16 worker threads.
[ INFO] [1671143492.616234616]: Loading nodelet /ouster/os_node of type nodelets_os/OusterSensor to manager os_nodelet_mgr with the following remappings:
[ WARN] [1671143492.645748418]: lidar port set to zero, the client will assign a random port number!
[ WARN] [1671143492.645776010]: imu port set to zero, the client will assign a random port number!
[ INFO] [1671143492.645783905]: Will use automatic UDP destination
[ INFO] [1671143492.729254450]: Sensor os1-991924000182.local configured successfully
[ INFO] [1671143492.729316516]: Starting sensor os1-991924000182.local initialization...
[2022-12-16 09:31:32.729] [ouster::sensor] [info] initializing sensor: os1-991924000182.local with ports: 0/0
[ INFO] [1671143493.626949239]: Loading nodelet /ouster/os_cloud_node of type nodelets_os/OusterCloud to manager os_nodelet_mgr with the following remappings:
[ INFO] [1671143493.634639589]: Loading nodelet /ouster/img_node of type nodelets_os/OusterImage to manager os_nodelet_mgr with the following remappings:
[ INFO] [1671143494.736689937]: rviz version 1.14.19
[ INFO] [1671143494.736764326]: compiled against Qt version 5.12.8
[ INFO] [1671143494.736781989]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1671143494.749240264]: Forcing OpenGl version 0.
[ INFO] [1671143495.020554745]: Stereo is NOT SUPPORTED
[ INFO] [1671143495.020687194]: OpenGL device: AMD RENOIR (DRM 3.42.0, 5.15.0-56-generic, LLVM 12.0.0)
[ INFO] [1671143495.020723161]: OpenGl version: 4.6 (GLSL 4.6) limited to GLSL 1.4 on Mesa system.
[ INFO] [1671143509.740858291]: Client version: 0.7.1b1+615e787-release
[ INFO] [1671143509.740913545]: Using lidar_mode: 1024x10
[ INFO] [1671143509.740939143]: OS-1-16-B13 sn: 991924000182 firmware rev: v2.4.0
[ INFO] [1671143509.742252705]: No metadata file was specified, using: os1-991924000182-metadata.json
[ INFO] [1671143509.742641934]: Wrote metadata to os1-991924000182-metadata.json
[ INFO] [1671143509.743868954]: get_metadata service created
[ INFO] [1671143509.744818414]: get_config service created
[ INFO] [1671143509.745597935]: set_config service created
[ INFO] [1671143509.756950143]: OusterCloud: retrieved sensor metadata!
[ INFO] [1671143509.757429632]: Profile has 1 return(s)
[FATAL] [1671143510.944264594]: Failed to load nodelet '/ouster/os_cloud_node` of type `nodelets_os/OusterCloud` to manager `os_nodelet_mgr'
[FATAL] [1671143510.944279231]: Failed to load nodelet '/ouster/img_node` of type `nodelets_os/OusterImage` to manager `os_nodelet_mgr'
/opt/ros/noetic/lib/nodelet/nodelet: line 1: 39664 Segmentation fault      (core dumped) $0 $@
================================================================================REQUIRED process [ouster/os_nodelet_mgr-2] has died!
process has died [pid 39642, exit code 139, cmd bash -c sleep 2; $0 $@ /opt/ros/noetic/lib/nodelet/nodelet manager __name:=os_nodelet_mgr __log:=/home/ced/.ros/log/37db9ca8-7cc8-11ed-b01c-555216779564/ouster-os_nodelet_mgr-2.log].
log file: /home/ced/.ros/log/37db9ca8-7cc8-11ed-b01c-555216779564/ouster-os_nodelet_mgr-2*.log
Initiating shutdown!
================================================================================
[relay_imu-8] killing on exit
[relay_lidar-7] killing on exit
[rviz-6] killing on exit
[ouster/img_node-5] killing on exit
[ouster/os_cloud_node-4] killing on exit
[ouster/os_node-3] killing on exit
[ouster/os_nodelet_mgr-2] killing on exit
[ INFO] [1671143511.061218452]: Unloading nodelet /ouster/os_node from manager os_nodelet_mgr
[ INFO] [1671143511.062543656]: waitForService: Service [/ouster/os_nodelet_mgr/unload_nodelet] could not connect to host [ced-notebook:50737], waiting...
[ WARN] [1671143511.062591356]: Couldn't find service os_nodelet_mgr/unload_nodelet, perhaps the manager is already shut down
[rosout-1] killing on exit
[master] killing on exit
shutting down processing monitor...
... shutting down processing monitor complete
done
Samahu commented 1 year ago

Are you using a Debug or Release build? Do you mind re-producing the issue on a different machine with fresh ROS installation? I am suspecting that you have something wrong with your ROS setup. Since I don't have a clue at what's going on, you'd probably need to build with Debug enabled and add GDB to the launch command to dig into the problem. Just gdb -ex run --args to the launch prefix of the notelet manager.

clegenti commented 1 year ago

Hi, I am working with other sensors (including a lidar from another brand) and everything works fine. I believe my ROS setup is (mostly ? ^^) good. Unfortunately, I don't have another fresh-installed machine to test with (will try to test with a labmate's laptop in the coming days and keep you posted).

So far I was compiling in Release (tried both catkin_make and catkin build). When putting the debug flag, the driver works (with both catkin_make and catkin build). I guess I will stick with that for the moment.

Thanks

Samahu commented 1 year ago

That's weird, in any case please let us know if you happen to find out why this was failing .. I did verify the driver (latest code) on three different machines with Release build and didn't observe any issues.

clegenti commented 1 year ago

I managed to try on two other laptops today. It worked without any issue. One was on 18.04 and the other on 20.04. Both where Intel Core i7 (two different generations) with 16GB of RAM. Could it be due to the AMD processor? (I don't think I can get my hand on another AMD laptop to try though).

(I am happy with compiling with the Debug build, I am not running after performance on my setup ^^)

Samahu commented 1 year ago

Hi @clegenti, I cannot negate the possibility that the problem is specific to AMD processors since I have only tested on intel and arm based processors (personally). I definitely would want to track this issue down and try to test the driver on AMD processors. Unfortunately, I can't verify this until January next year but in the mean time if you learn anything please let us know. I will keep the issue open till then.

Samahu commented 1 year ago

@clegenti today I tested ouster-ros (ros1) driver on an ubuntu 20.04 machine that is equipped with AMD Ryzen 9 processor and has a 32 GB of RAM. I didn't observer any issues, no segfaults. So we can eliminate the possibility that the issue has anything to do with the CPU family. However, looking at the logs you provided, it states that you have an AMD GPU while on my setup I have an NVIDIA GPU. I haven't tested the driver on a machine that has AMD or Intel cards, so it maybe related or not (our code doesn't rely on GPU currently). Did any of the other systems you mentioned that the driver worked on had an AMD GPU?

Since the segfault happens during the point_to_cloud method I do highly suspect that this is probably has to do with your current PCL or eigen installation. Could you get the current version info(s) of these two packages and see if there is an upgrade and whether that may resolve the issue.

Samahu commented 1 year ago

@clegenti Since I couldn't re-produce the issue using the same processor family that you are using as such I do think that the problem is specific to your setup. I pointed few hints on where the issue might be, at this point I am closing the issue but please let us know if you still think it is an actual issue in the driver code.