msgs: allow servers to report multiple error codes

gavanderhoorn commented 1 year ago

As per subject.

The main motivation for this change is a development in MotoROS2, where we'd like to be able to report multiple active alarms and errors. With error_code being a scalar, that was not possible, so changing it to a list was necessary.

As the ability to report multiple active alarms at the same time is something that can be generalised across different vendors, I'd like to merge this change here, upstream. However, I don't intend to merge this into melodic-devel, as this is a breaking change and we can't introduce that in Melodic nor Noetic any more.

I wanted to open a PR to have a venue for any potential discussion about this, and eventually would merge this change (but not this PR) into the https://github.com/ros-industrial/industrial_core/compare/melodic-devel...ros2_msgs_only branch, which might eventually evolve into a proper ROS 2 branch.

gavanderhoorn commented 1 year ago

Pinging some people who might have comments: @marip8 @JeremyZoss @simonschmeisser

JeremyZoss commented 1 year ago

I don't have a problem with this. Robot-side drivers that are "smart enough" to be direct ROS2 nodes should be able to populate this dynamic array easily enough. If we still use "dumb" robot-side drivers that communicate with ROS2 nodes to publish ROS2 messages (in the style of ROS1 industrial_robot_client), they supply this data as a dynamic array, fixed array, or scalar (as currently done), and the ROS1 node can convert into this message format. Other ROS2 nodes that consume this message can just pop off the first error code if they're not capable of supporting >1 error code.

While you're considering changing the message, is it worth expanding the definition to allow publishing of string error messages? Strings are potentially harder for robots to publish, can be more user-friendly if displayed in logs or HMI, and are probably harder for integration into application state-machine logic. If the push is for robots to provide native ROS2 drivers, maybe the better solution for strings is to have robots publish directly to the ROS logging system.

gavanderhoorn commented 1 year ago

I don't have a problem with this. Robot-side drivers that are "smart enough" to be direct ROS2 nodes should be able to populate this dynamic array easily enough. If we still use "dumb" robot-side drivers that communicate with ROS2 nodes to publish ROS2 messages (in the style of ROS1 industrial_robot_client), they supply this data as a dynamic array, fixed array, or scalar (as currently done), and the ROS1 node can convert into this message format. Other ROS2 nodes that consume this message can just pop off the first error code if they're not capable of supporting >1 error code.

Exactly. That was also what I had in mind (although I didn't phrase it using dumb and smart ;) ).

While you're considering changing the message, is it worth expanding the definition to allow publishing of string error messages? Strings are potentially harder for robots to publish, can be more user-friendly if displayed in logs or HMI, and are probably harder for integration into application state-machine logic. If the push is for robots to provide native ROS2 drivers, maybe the better solution for strings is to have robots publish directly to the ROS logging system.

Strings could be nice, but support for them in things like micro-ROS (and other truly embedded environments) is somewhat patchy. They either require dynamic memory allocation, or setting up some fixed-size "always big enough" buffer. The former is not always possible, the latter just leads to wasted space -- and what would be "big enough" exactly?

An alternative I had in mind was to keep RobotStatus lean and offer an additional service, which lets clients retrieve additional information about any active alarms listed in error_codes.

This would reduce bw used by RobotStatus (no arbitrary length strings published @ at least 10 Hz), would allow additional information to be returned by the server (not just error strings, but perhaps "cause codes" or whatever else a specific OEM supports).

The service would most likely not be generic, but I believe that'd be OK: error codes are not generic anyway, and it's likely any remedial action would be vendor-specific too.

Finally: I also wanted to keep this change small. So we can definitely discuss changing more than just error_code(s), but I'd like to keep that discussion for a different PR/issue/time.

Edit: off-topic, but:

If the push is for robots to provide native ROS2 drivers, maybe the better solution for strings is to have robots publish directly to the ROS logging system.

which is actually what we can also do in MotoROS2.

marip8 commented 1 year ago

Looks good to me

gavanderhoorn commented 1 year ago

Thanks for your comments @JeremyZoss @marip8.

I'm going to merge the change into ros2_msgs_only.

simonschmeisser commented 1 year ago

We have situations where we want to communicate "runtime errors" as well as system errors. Example would be if only the status ethernet connection is enabled but not the trajectory one. It's not a system error but still the system as a whole is not operational. But likely this should be a different field?

gavanderhoorn commented 1 year ago

It's not a system error but still the system as a whole is not operational. But likely this should be a different field?

to me that sounds like something different from what error_code(s) is for, yes.

error_code is really for vendor-defined errors. Specifically, the errors defined by the industrial robot OEM.

What you describe sounds more like a status (ie: system operational, or something else), not an error situation.

Example would be if only the status ethernet connection is enabled but not the trajectory one.

shouldn't this be reflected by the motion_possible field? If there is no connection to the motion server, motion would not seem to be possible. Or at least not motion commanded by the ROS client.

simonschmeisser commented 1 year ago

shouldn't this be reflected by the motion_possible field? If there is no connection to the motion server, motion would not seem to be possible. Or at least not motion commanded by the ROS client.

True, but that does not allow to indicate any hints on how to fix the situation. Maybe your suggestion to use an additional separate detailed status would indeed be right here.

ros-industrial / industrial_core

msgs: allow servers to report multiple error codes #289