Instances no longer transition to failed state when propolis zone has crashed or is gone

Agreed that this is an unforeseen consequence of #4682. Previously, Nexus::instance_reboot could assume that instance_request_state would call handle_instance_put_result, which would handle these sorts of errors by transitioning instances to Failed. After #4682, callers of instance_request_state have to decide how to handle state change failures themselves, and instance_reboot does that by just using the ? operator to kick errors up to the caller.

Using a similar pattern to instance_clear_migration_ids (check to see if the state change request returned an error; if it's a SledAgentInstancePutError and instance_unhealthy() returns true, call mark_instance_failed before returning the error) should restore the old behavior.

The ability to handle propolis zone gone situation may not be by design (though we rely on it quite a bit now during sled planned or unplanned reboot). The ability to handle propolis zone crashing due to unhandled hypervisor or OS exception is probably something the system should have.

This is tracked by #3206.

oxidecomputer / omicron

Instances no longer transition to failed state when propolis zone has crashed or is gone #4709