Open afritzler opened 4 months ago
To kick of some discussion here is a short sketch, which more or less is a trimmed down version of the NodeMaintenance proposal.
We introduce the ServerMaintenance
custom resource to signal maintenances on Servers.
// +enum
type ServerMaintenanceStage string
const (
// Idle announces a maintenance.
Idle ServerMaintenanceStage = "Idle"
// InMaintenance shuts down servers, potentially releasing server claims.
InMaintenance ServerMaintenanceStage = "InMaintenance"
// As long as no other maintenance is applied to a server, they are made ready again.
Complete ServerMaintenanceStage = "Complete"
)
type ServerMaintenance struct {
...
Spec ServerMaintenanceSpec
Status ServerMaintenanceStatus
}
type ServerMaintenanceSpec struct {
// ServerSelector selects servers for this maintenance.
ServerSelector *v1.ServerSelector
// The order of the stages is Idle -> InMaintenance -> Complete.
// The default value is Idle.
Stage ServerMaintenanceStage
// Reason for the maintenance.
Reason string
}
As soon as the Stage is set to InMaintenance
all selected Servers
will have their .spec.power
set to Off
.
The respective ServerClaims
, if any, can either be re-bound or kept.
Likely, such a setting needs to be added to a ServerClaim
.
On completion all selected Servers
will have .spec.power
to On
, when no other ServerMaintenance
with stage InMaintenance
selects a Server
.
Completed ServerMaintenances
need to be garbage collected eventually.
The cloud-provider-manager-metal can have a watch on ServerMaintenances
and forward information about ServerMaintenances
into it's Kubernetes cluster.
Do we need some mechanism to block/deny maintenances on Servers
?
@Nuckal777 thanks for input!
I'm pretty sure, that we must not power off servers in such an implicit manner. For instance, firmware updates, especially which require server reboot, should be also considered as maintenance. Hence, it might not be reasonable to force server power off.
Apart from that, any of controllers which work with Server
CR would have to loop over ServerMaintenance
objects to find whether server is in maintenance or not. This could be solved by the additional field or condition in servers status, however, it might cause data race between controllers as soon as both spec (to switch power) and status (to set maintenance state) have to be updated.
From my perspective, maintenance state should be represented in .spec
of the Server
CR:
From my perspective, maintenance state should be represented in .spec of the Server CR
There is an issue with that design: When 2 controllers powered off a Server
and one controller is finished, the finished controller will set the power state to on and the second controller to off. The controller start fighting over that field. metal3 uses a similar design for that reason. Further reasoning is in the declarative node API proposal.
For instance, firmware updates, especially which require server reboot, should be also considered as maintenance.
Why couldn't a the firmware upgrade handling create a ServerMaintenance
and delete it after finishing?
Apart from that, any of controllers which work with Server CR would have to loop over ServerMaintenance objects to find whether server is in maintenance or not.
I agree. This would be a disadvantage.
I like the idea of having an own resource initiating the Maintenance
of a Server
. Regarding the ServerClaim
binding: Power management and other updates will only happen, when the Server
is in a Reserved
state - meaning once a Server
is in Maintenance
, only the MaintenanceReconciler
should perform operations on the Server
e.g. creating a BootConfiguration
to boot into the BIOS/Firmware update mode etc.
There is an issue with that design: When 2 controllers powered off a Server and one controller is finished, the finished controller will set the power state to on and the second controller to off. The controller start fighting over that field. metal3 uses a similar design for that reason. Further reasoning is in the declarative node API proposal.
As @afritzler mentioned, it should be done in such a way, that particular controller may manage server's power state only if server is in particular state. Like if server is in Reserved state, then only ServerClaim controller could power it on/off, if server is in Maintenance state, then only Maintenance controller could power it on/off and etc.
Why couldn't a the firmware upgrade handling create a
ServerMaintenance
and delete it after finishing?
It definitely could, but I was pointing that is not necessarily need to power off the server in maintenance state.
After reading the node in-place update discussion of Gardener, I had some time to look into the topic again.
It was never explicitly mentioned in this ticket, but I think there are to main use cases for the maintenance of a Server
:
Am I missing something from the metal-operator perspective?
As with the BIOS updates, I think we should focus on managing maintenances on a single Server
first.
Declaring a maintenance for multiple Servers
via a label selector can be added with an additional CRD.
That way we also work around the issue that the set of selected Servers
by a maintenance can be larger than the subset of Servers
, which can be maintained at once due to higher level concerns.
So, if I understand both of your correctly, you'd like the declaration of a maintenance to be a mean of moving the permission of power-cycling a Server
from the ServerClaim
controller to a different controller.
An alternative ServerMaintenance
object, which targets a single Server
, could like:
// +enum
type ServerMaintenanceStage string
const (
// Idle announces a maintenance.
Idle ServerMaintenanceStage = "Idle"
// InMaintenance shuts down servers, potentially releasing server claims.
InMaintenance ServerMaintenanceStage = "InMaintenance"
// As long as no other maintenance is applied to a server, they are made ready again.
Complete ServerMaintenanceStage = "Complete"
)
type ServerMaintenance struct {
...
Spec ServerMaintenanceSpec
Status ServerMaintenanceStatus
}
type ServerMaintenanceSpec struct {
// ServerRef selects a server for this maintenance.
ServerRef *v1.ObjectReference
// The order of the stages is Idle -> InMaintenance -> Complete.
// The default value is Idle.
Stage ServerMaintenanceStage
// TargetState specifies the state the server should be moved to
// when the stage changes to InMaintenance
TargetState ServerState
// Reason for the maintenance.
Reason string
}
Potentially, a list of maintenances can also be inlined into the Server
object.
An exemplary upgrade flow for a Server could then look like this:
Server
is in the Reserved
status in the beginningServerBIOS
/ServerFirmware
spec is changedServerMaintenance
object in the Idle
maintenance stageServerMaintenance
stage to InMaintenance
Server
status is changed to Updating
(based on target state in the ServerMaintenance
object)
Server
from the ServerClaim
controller and hands the responsibility for it over to the controller, which handles the upgradeServerMaintenance
object is set to Complete
Server
status is changed back to Reserved
, because a ServerClaim
exists
ServerClaim
controllerThe hardware failure case would be similar, but require a different controller, which just turns the power off and another HardwareFailure
state on the Server
object.
Servers
in the Available
state can be moved to InMaintenance
stage upon creation of the ServerMaintenance
object.
How about adding a maintenance window as well? So one can decide when a bios or firmware upgrade should take place or should that be done on another service?
Maybe adding another stage:
Planned ServerMaintenanceStage = "Planned"
Summary
Describe a concept on how we can put a
Server
intoMaintenance
mode. Also evaluate what it would mean to move it back into an operational state.The Kubernetes community has a KEP extending the
Node
API by a declarative approach in handlingNode
maintenance: https://github.com/kubernetes/enhancements/issues/4212We should see how those concepts can be applied for our
Server
objects.Implications to consider:
How is the maintenance state influences the behaviour of controllers sitting on top of the
metal
API e.g.:Expected outcome:
metal
types