rock-core / tools-roby

The roby plan manager
Other
3 stars 11 forks source link

rework how quarantined tasks are handled #198

Closed doudou closed 3 years ago

doudou commented 3 years ago

The quarantine code was a very old concept, that never really got overhauled. It's trying to handle one of the dark alleys of state management: what to do when tasks don't want to get stopped (or, even worse, fail to).

When mapping an external process to a Roby task, it was also meant to map the "I don't know what's the state of the remote process anymore" situation. I.e. what to do when things go really wrong.

The (really old) reaction was to isolate the task and let it be. Turns out that by removing its relationships, we also remove the ability for Roby to figure out that something is wrong (since we removed the dependencies). This is fine as long as the quarantine was the byproduct of the GC process (the I can't stop / I don't want to stop) when figured out by the garbage collection.

But the "I got an exception but I failed to stop" or "I can't read the state of this external process" situations need to also cleanup whatever depend on the task at fault. Which is usually handled by an error.

This commit changes the quarantine process to:

Overall, this should play a lot better with recovery mechanisms, which can figure out the best way to handle the rest (Syskit may for instance try to restart the component's deployment)