oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

[nexus] Allow stopping `Failed` instances #6652

Closed hawkw closed 1 month ago

hawkw commented 1 month ago

PR #6503 changed Nexus to attempt to automatically restart instances which are in the Failed state. Now that we do this, we should probably change the allowable instance state transitions to permit a user to stop an instance that is Failed, as a way to say "stop trying to restart this instance" (as Stopped instances are not restarted). This branch changes Nexus::instance_request_state and select_instance_change_action to permit stopping a Failed instance.

Fixes #6640 Fixes #2825, along with #6455 (which allowed restarting Failed instances).

hawkw commented 1 month ago

@david-crespo I was just about to leave a comment letting you know about this change, but it looks like you were way ahead of me with oxidecomputer/console#2468. Nice :)

david-crespo commented 1 month ago

My email inbox is a nightmare but I am up to date

hawkw commented 1 month ago

The CI failure is due to an unexpected Tokio task cancellation in test_omdb_success_cases, which is a known test flake --- see #6505. I'm going to restart that run.