Open smklein opened 1 year ago
Nexus can use an RPW to look for instances that are marked as "failed + auto_boot_on_fault", and re-provision them in the background.
If we use the existing Failed
state for this, we'll need to make sure that
We might decide to have different failure reasons to help us distinguish these cases.
Most of the stuff described in "Updating Instance State Within Nexus" was implemented in a combination of #5611, #5759, and #6503. The proactive registration of sled-agents with Nexus isn't something we've done yet.
oximeter
collector recorded in theomicron.public.metric_producer
table. When instances are stopped, that assignment needs to be removed by the cleanup-portion of that RPW.