Open Rayman opened 1 year ago
Hi @Rayman and thank you for your comment. I agree with you that this makes more sense but, as you stated, this is a breaking change so for Noetic I do not believe it can be merged as is. I am not sure how the ros2 maintainers would see this change for the ros2 version...
Hi @Rayman. Thanks for your suggestion. Just to be clear: The stale state is set by the generic analyzer if no message was received within a given timeout: https://github.com/ros/diagnostics/blob/a80bd1c33786e5e5642f91b2b6016048f32fbf0e/diagnostic_aggregator/include/diagnostic_aggregator/generic_analyzer_base.hpp#L196 Which can be useful information on the actuality of a state.
If I get it correctly, your suggestion is to treat it in aggregation like the other levels and aggregate it in the group. There, I would honestly have a hard time to rate it in severity between the other levels. Currently, it is level 3 which reads as the highest priority. This means you would only see stale on the highest level, even if another item in that group is in the error state. This can not be the intended behavior. Changing these levels would be a SERIOUS breaking change.
What is your take on that?
The toplevel state is not just the maximum of all the levels. Its calculated with the following algorithm
if maximum_level > ERROR and minimum_level <= ERROR
# one or more STALE, but not all of them
level = ERROR
else:
level = maximum_level
I would propose to change this to the following, because I think it's more logical.
if maximum_level == STALE and maximum_level_without_stale < ERROR
# one or more STALE, but no errors
level = STALE
else:
level = maximum_level_without_stale
This will be the difference between the two algorithms: | diagnostic1 | diagnostic2 | current | proposed |
---|---|---|---|---|
stale | ok | error | stale | |
stale | warn | error | stale | |
stale | error | error | error | |
stale | stale | stale | stale |
What I find counter-intuitive about the current behavior is that if you have three leaf diagnostics rolled up into a group, the discard_stale
doesn't seem to have an impact on the parent status. For example, if bar
and baz
in the example below go stale, but foo
is OK, I would intuitively think that the part
group should also be OK. However, what I'm seeing is that foo
currently gets marked as ERROR
.
diagnostics_aggregator:
ros__parameters:
pub_rate: 1.0
path: 'robot'
analyzers:
part:
type: 'diagnostic_aggregator/AnalyzerGroup'
path: 'part'
foo:
type: 'diagnostic_aggregator/GenericAnalyzer'
path: 'foo'
find_and_remove_prefix: ['/foo:']
num_items: 1
bar:
type: 'diagnostic_aggregator/GenericAnalyzer'
path: 'bar'
find_and_remove_prefix: ['/foo:']
discard_stale: true
baz:
type: 'diagnostic_aggregator/GenericAnalyzer'
path: 'baz'
find_and_remove_prefix: ['/baz:']
discard_stale: true
I did not want to propose to merge a breaking change in noetic, so I've implemented my proposed change in our fork: https://github.com/nobleo/diagnostics. Feel free to use it
I've implemented the change for the toplevel diagnostics and for the AnalyserGroup
.
I also added my implementation here: https://github.com/ros/diagnostics/pull/315
In the current code the toplevel state is only STALE when ALL the diagnostics are STALE. Example:
What I think is more logical is that the state is STALE when one of the diagnostics is STALE and none of them ERROR:
What do you think about this logic? I've implemented this in my fork, but it is a breaking change