Unconfigured lifecycle state management

jginesclavero commented 4 years ago

Hi again @norro!

Yesterday, I had a meeting with @chcorbato , and we talked about the case where a lifecycle node transits to ErrorProcessing. Following the documentation and the lifecycle node diagrams, if a node has an error it transits to ErrorProcessing. Then, based on this processing result, it can go to the Finalized state or Unconfigured state. Do you think that the system_modes must manage the unconfigured state of the lifecycle nodes? This management covers this situation and the start-up situation, where the nodes are in the unconfigured state.

Thank you!

chcorbato commented 4 years ago

Hi again @norro!

Yesterday, I had a meeting with @chcorbato , and we talked about the case where a lifecycle node transits to ErrorProcessing. Following the documentation and the lifecycle node diagrams, if a node has an error it transits to ErrorProcessing. Then, based on this processing result, it can go to the Finalized state or Unconfigured state. Do you think that the system_modes must manage the unconfigured state of the lifecycle nodes? This management covers this situation and the start-up situation, where the nodes are in the unconfigured state.

Thank you!

This is in the context of our exemplary case of the laser_driver error. We want to elaborate on the layered approach we discussed in the last MROS meeting. This is how I interpret our desired design (please comment if something is not correct or clear):

First the laser_driver code for handling errors tries to recover from the error in the ErrorProcessing transition state.

(from here it is a related but different issue #48)

If it does not succeed (I guess that means node does not transition to Active), the ModeManager tries to recover from the error using the feature/rules. For this, @jginesclavero is adding a rule in the SystemModes file of our system.
If there is no rule, or there is but after applying it the alternative MODE(s) of the laser_driver are not reached either, the ModeManager reports to the Metacontroller that the corresponding (sub)system(s) MODE(s) are not reachable. (see issue for the continuation of the handling of errors at the higher layers)

norro commented 4 years ago

I agree with 1. and 2. However, the mode manager will not actively report that a certain mode is not available. With https://github.com/micro-ROS/system_modes/issues/43, however, it will be possible for the meta control to get the information, which modes are available.

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

chcorbato commented 4 years ago

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

Very good point indeed, so far we are not accounting for timing issues. How do we include timing constraints for node management? These could be considered metacontrol requirements for the robotic application:

How should these requirements be defined? Language, relation to MROS metamodel @darkobozhinoski and ontology @rsanz @estherag
Where should they be defined? I think we should have a discussion about this on the next meeting @darkobozhinoski, ideally with the input of all ROS developers/architects in MROS @gavanderhoorn @marioney @wasowski @fmrico @jginesclavero @lbajo @ralph-lange

rsanz commented 4 years ago

This implies the incorporation of some timestamping and temporal [interval] reasoning. We can incorporate some concepts from e.g. UML2 or UML MARTE.

chcorbato commented 4 years ago

However, the mode manager will not actively report that a certain mode is not available. With #43, however, it will be possible for the meta control to get the information, which modes are available.

I agree. So the current design proposal is that Mode Manager just inform about available and reachable modes, and Metacontrol is responsible for inferring from that about the success of reconfiguration actions. See below for how to model that reasoning.

This is also a question of timing for the following reason: Any state/mode transition will take some time (miliseconds to seconds, maybe), even in the normal, non-failure case. So it is not entirely clear, when someone (the mode manager? metacontrol?) should decide, that a transition or rule didn't work out and other actions have to be taken. I think this kind of decision, how long to wait for a node to recover or a rule to take effect, is best placed in the metacontrol, since this is probably task-specific.

Very good point indeed, so far we are not accounting for timing issues.

How do we include timing constraints for node management? These could be considered metacontrol requirements for the robotic application:

How should these requirements be defined? Language, relation to MROS metamodel @darkobozhinoski and ontology @rsanz @estherag

This implies the incorporation of some timestamping and temporal [interval] reasoning. We can incorporate some concepts from e.g. UML2 or UML MARTE.

@rsanz can you point to the specific concepts? I think we need to specify some modelling requirements (see below) to evaluate which concepts we need.

Where should they be defined? I think we should have a discussion about this on the next meeting @darkobozhinoski, ideally with the input of all ROS developers/architects in MROS @gavanderhoorn @marioney @wasowski @fmrico @jginesclavero @lbajo @ralph-lange

Modelling reqs for reconfiguration actions and timing

Metacontroller needs info on how long a reconfiguration action can take, to decide its success or failure. This depends on:
- type of reconfiguration action: mode change, re-mapping, ~~deploy node~~ (we decided all nodes would be deployed, req for mode manager)
- the node/ susbsystem reconfigured

We could provide this information in the MROS model of the system (Darko's metamodel) as we are doing with the QAs, but I think it is more related to the specific software components that to the application logic.

We could define default values in the MRSO metacontroller to assume when no info is provided. E.g. assume node mode change takes up to 2secs, and subsystem mode change can take up to 5secs @jginesclavero @lbajo @marioney @fmrico what numbers are reasonable for navigation2 nodes?

micro-ROS / system_modes

Unconfigured lifecycle state management #47

Modelling reqs for reconfiguration actions and timing