At present, if one attempts to set a SP to a power state that it's already in, one receives a 503 Service Unavailable. For instance, attempting to update Sled 14 on madrid failed when wicketd attempted to put it in A2, because the system was already in A2:
[ mupdate] [sled 14 00:00:00] Running 7/19) Setting host power state to A2
[ mupdate] [sled 14 00:00:00] Failed 7/19) Setting host power state to A2: after 2.38ms: updating power state failed
[ mupdate] [sled 14 00:00:00] Caused by:
[ mupdate] [sled 14 00:00:00] - Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "f64e6790-ccfa-43e0-a629-f1ee71d32c5b", "content-length": "233", "date": "Sun, 28 Dec 1986 00:05:40 GMT"}; value: Error { error_code: Some("SpCommunicationFailed"), message: "error communicating with SP SpIdentifier { typ: Sled, slot: 14 }: Error response from SP: power state error (code 1))", request_id: "f64e6790-ccfa-43e0-a629-f1ee71d32c5b" }
[ mupdate] [sled 14 00:00:00] Terminal Execution failed
gateway-sp-comms' SingleSp::set_power_state method just tries to perform the RPC, bubbling up the error from the SP without checking whether it's already in the desired state:
We should probably make this method succeed when the host is already in the desired state, instead. Although we could make the sequencer task in Hubris responsible for this, it seems to me that it might make more sense to do it here in MGS?
At present, if one attempts to set a SP to a power state that it's already in, one receives a 503 Service Unavailable. For instance, attempting to update Sled 14 on
madrid
failed whenwicketd
attempted to put it in A2, because the system was already in A2:This appears to be because the
gimlet-seq
task in Hubris returns an error forA2
->A2
transitions. See this match: https://github.com/oxidecomputer/hubris/blob/2f24dd4419634ca95742de901d18e52ff14b912b/drv/gimlet-seq-server/src/main.rs#L696-L918gateway-sp-comms
'SingleSp::set_power_state
method just tries to perform the RPC, bubbling up the error from the SP without checking whether it's already in the desired state:https://github.com/oxidecomputer/management-gateway-service/blob/7518095686028d80b8125de62ba7d179cbc2a683/gateway-sp-comms/src/single_sp.rs#L652-L657
aaaaaaand finally, over in Omicron, the MGS HTTP API just bubbles up whatever error
SingleSp::set_power_state
returns: https://github.com/oxidecomputer/omicron/blob/d96ea7ca8945b8ad78a53fd083850ea39789e5f0/gateway/src/http_entrypoints.rs#L623-L637We should probably make this method succeed when the host is already in the desired state, instead. Although we could make the sequencer task in Hubris responsible for this, it seems to me that it might make more sense to do it here in MGS?