tier4 / nebula

A universal LiDAR and radar driver for ROS 2, supporting Hesai, Velodyne, Robosense and Continental sensors.
https://tier4.github.io/nebula/
Apache License 2.0
51 stars 51 forks source link

fix(hesai_hw_interface): add error handling #131

Closed mojomex closed 6 months ago

mojomex commented 7 months ago

PR Type

Related Links

This PR will not fix this issue but at least throw an exception with a detailed error message instead of a segfault, making future similar bugs much easier to spot:

Description

This PR introduces error checking and handling in hesai_hw_interface.cpp. The following features have been added:

The following other things have changed as a result of the fix:

Behavior for TCP commands is now as follows:

Review Procedure

Test TCP with real sensors (at least AT128, OT128 and an older sensor).

Remarks

We should refactor TCP communication in a similar way that was done with mixins in #104. This would make the sensor definition much more readable: one could read the sensor definition alongside the TCP documentation, without boilerplate code. This would provide much better maintainability, and, for new sensors, forces the developer to check every TCP command for changes when implementing.

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

CI Checks

mojomex commented 7 months ago

(I still have to test this on real sensors)

amc-nu commented 7 months ago

Test Result Pandar128E4X

@mojomex @drwnz

The driver works with an actual sensor. Point cloud looks normal

Issues:

[hesai_hw_interface_ros_wrapper_node-2] Starting UDP server on: SensorModel: Pandar128_E4X_OT, FrameID: hesai, HostIP: 255.255.255.255, SensorIP: 192.168.1.201, DataPort: 940, ReturnMode: Dual, Frequency: 0, MTU: 5dc, Use sensor time: 0, GnssPort: 941, ScanPhase:0, RotationSpeed:258, FOV(Start):0, FOV(End):168, DualReturnDistanceThreshold:0.1, PtpProfile:IEEE_1588v2, PtpDomain:0, PtpTransportType:UDP/IP, PtpSwitchType:TSN
[hesai_hw_monitor_ros_wrapper_node-3] 192.168.1.201:9347
[hesai_hw_monitor_ros_wrapper_node-3] [INFO 1712038423.862968919] [TcpSocket::open]: connected (open() at /home/ne0/nebula_ws/src/transport_drivers/tcp_driver/src/tcp_socket.cpp:531)
[hesai_hw_monitor_ros_wrapper_node-3] [ERROR 1712038423.863656262] [hesai_hw_monitor_hesai]: HesaiInventory from Sensor has unknown format. Will parse anyway. (PrintError() at /home/ne0/nebula_ws/src/nebula/nebula_hw_interfaces/src/nebula_hesai_hw_interfaces/hesai_hw_interface.cpp:1331)
[hesai_hw_monitor_ros_wrapper_node-3] HesaiInventory
Parameter Screenshot
Voltage image
Temperature image
RPM image
No PTP image
PTP image
amc-nu commented 7 months ago

Test PandarXT32

The driver works with an actual sensor. Pointcloud looks normal.

[hesai_hw_interface_ros_wrapper_node-2] Starting UDP server on: SensorModel: PandarXT32, FrameID: hesai, HostIP: 255.255.255.255, SensorIP: 192.168.1.201, DataPort: 940, ReturnMode: Dual, Frequency: 0, MTU: 5dc, Use sensor time: 0, GnssPort: 941, ScanPhase:0, RotationSpeed:258, FOV(Start):0, FOV(End):168, DualReturnDistanceThreshold:0.1, PtpProfile:IEEE_1588v2, PtpDomain:0, PtpTransportType:UDP/IP, PtpSwitchType:TSN
[hesai_hw_monitor_ros_wrapper_node-3] 192.168.1.201:9347
[hesai_hw_monitor_ros_wrapper_node-3] [INFO 1712039923.485050914] [TcpSocket::open]: connected (open() at /home/ne0/nebula_ws/src/transport_drivers/tcp_driver/src/tcp_socket.cpp:531)
[hesai_hw_monitor_ros_wrapper_node-3] HesaiInventory
[hesai_hw_monitor_ros_wrapper_node-3] sn: XT3XXXXXXXX, date_of_manufacture: XXXX-XX-XX, mac: ec:9f:XX:XX:XX:XX, sw_ver: 0.1.25, hw_ver: 1.0.0, control_fw_ver: 1.1.13, sensor_fw_ver:   1.2.25, angle_offset: 1048, model: , motor_type: , num_of_lines:  , 00 00 00 00 00 00 00 00 00 00 00
Parameter Screenshot
No PTP image
PTP image
RPM image
Temperature image
Voltage image
mojomex commented 7 months ago

@amc-nu

I see some output marked as error: [hesai_hw_monitor_hesai]: HesaiInventory from Sensor has unknown format. Will parse anyway. sn: XXXX, date_of_manufacture: XXXX-XX-XX, mac: ec:9f:XX:XX:XX:XX, sw_ver: 1.3.19, hw_ver: , control_fw_ver: 1.3.10, sensor_fw_ver: 1.3.13, angle_offset: 8169, model: *, motor_type: , num_of_lines: �, 4f 54 31 32 38 2d 42 30 31 00 00 (SN and MAC hidden on purpose)

The actual parsing is the same in this PR as it was before. However, OT128 (and to some extent AT128, possibly others) have slightly different struct definitions from e.g. XT32. As it was not an issue until now, I opted to print an error that we are parsing the wrong data for some fields, rather than doing a proper refactoring at this point.

What has changed however, is that payload size is now checked before parsing into the structs is attempted.

mojomex commented 7 months ago

@amc-nu

The diagnostics page reports 1800V input, might need some check.

OT128's TCP API doc lists the input_voltage and input_current fields in a different order than the HesaiLidarMonitor struct at hesai_cmd_response.hpp:480. I think this issue already existed, as I did not change that struct. I do not have access to all TCP API docs of Hesai sensors so I cannot confirm at this point if it is in fact only different for OT128.

I have added FIXMEs to all structs where I found the API docs I do have (OT128 and AT128) to differ from the implementation.

Do you think it is best to add an exception if (sensor_model == OT128) to the parser?

mojomex commented 7 months ago

sn: XXXX, date_of_manufacture: XXXX-XX-XX, mac: ec:9f:XX:XX:XX:XX, sw_ver: 1.3.19, hw_ver: , control_fw_ver: 1.3.10, sensor_fw_ver: 1.3.13, angle_offset: 8169, model: *, motor_type: , num_of_lines: �, 4f 54 31 32 38 2d 42 30 31 00 00

Indeed, the output had some other errors as well: I forgot to reset the output modifiers (std::hex etc.) after printing MAC addresses etc. and printed some uint8 numbers as characters instead of numbers. Fixed in e5d0a9c.

Output now:

sn: XXXX, date_of_manufacture: 2023-11-23, mac: ec:xx:xx:xx:xx, sw_ver: 1.4.01, hw_ver: , control_fw_ver: 1.4.1, sensor_fw_ver: 1.4.1, angle_offset: 19817, model: 42, motor_type: 1, num_of_lines: 128, 4f 54 31 32 38 2d 42 30 31 00 00