The quarantine code was a very old concept, that never really got
overhauled. It's trying to handle one of the dark alleys of state
management: what to do when tasks don't want to get stopped (or,
even worse, fail to).
When mapping an external process to a Roby task, it was also meant
to map the "I don't know what's the state of the remote process
anymore" situation. I.e. what to do when things go really wrong.
The (really old) reaction was to isolate the task and let it be.
Turns out that by removing its relationships, we also remove the
ability for Roby to figure out that something is wrong (since
we removed the dependencies). This is fine as long as the quarantine
was the byproduct of the GC process (the I can't stop / I don't want
to stop) when figured out by the garbage collection.
But the "I got an exception but I failed to stop" or "I can't read
the state of this external process" situations need to also cleanup
whatever depend on the task at fault. Which is usually handled by
an error.
This commit changes the quarantine process to:
generate an error (QuarantinedTaskError) to cleanup whatever
is using it
keep relations until they are cleaned up by the GC
the GC still leaves quarantined tasks alone (does not try to
automatically stop them)
Overall, this should play a lot better with recovery mechanisms, which
can figure out the best way to handle the rest (Syskit may for instance
try to restart the component's deployment)
The quarantine code was a very old concept, that never really got overhauled. It's trying to handle one of the dark alleys of state management: what to do when tasks don't want to get stopped (or, even worse, fail to).
When mapping an external process to a Roby task, it was also meant to map the "I don't know what's the state of the remote process anymore" situation. I.e. what to do when things go really wrong.
The (really old) reaction was to isolate the task and let it be. Turns out that by removing its relationships, we also remove the ability for Roby to figure out that something is wrong (since we removed the dependencies). This is fine as long as the quarantine was the byproduct of the GC process (the I can't stop / I don't want to stop) when figured out by the garbage collection.
But the "I got an exception but I failed to stop" or "I can't read the state of this external process" situations need to also cleanup whatever depend on the task at fault. Which is usually handled by an error.
This commit changes the quarantine process to:
Overall, this should play a lot better with recovery mechanisms, which can figure out the best way to handle the rest (Syskit may for instance try to restart the component's deployment)