ironcore-dev / metal-operator

Kubernetes operator for automating bare metal server discovery and provisioning
Apache License 2.0
9 stars 4 forks source link

Declarative `Server` Maintenance #76

Open afritzler opened 2 months ago

afritzler commented 2 months ago

Summary

Describe a concept on how we can put a Server into Maintenance mode. Also evaluate what it would mean to move it back into an operational state.

The Kubernetes community has a KEP extending the Node API by a declarative approach in handling Node maintenance: https://github.com/kubernetes/enhancements/issues/4212

We should see how those concepts can be applied for our Server objects.

Implications to consider:

How is the maintenance state influences the behaviour of controllers sitting on top of the metal API e.g.:

Expected outcome:

Nuckal777 commented 2 weeks ago

To kick of some discussion here is a short sketch, which more or less is a trimmed down version of the NodeMaintenance proposal. We introduce the ServerMaintenance custom resource to signal maintenances on Servers.

// +enum
type ServerMaintenanceStage string

const (
// Idle announces a maintenance.
Idle ServerMaintenanceStage = "Idle"
// InMaintenance shuts down servers, potentially releasing server claims.
InMaintenance ServerMaintenanceStage = "InMaintenance"
// As long as no other maintenance is applied to a server, they are made ready again.
Complete ServerMaintenanceStage = "Complete"
)

type ServerMaintenance struct {
    ...
    Spec ServerMaintenanceSpec
    Status ServerMaintenanceStatus
}

type ServerMaintenanceSpec struct {
    // ServerSelector selects servers for this maintenance.
    ServerSelector *v1.ServerSelector

    // The order of the stages is Idle -> InMaintenance -> Complete.
    // The default value is Idle.
    Stage ServerMaintenanceStage

    // Reason for the maintenance.
    Reason string
}

As soon as the Stage is set to InMaintenance all selected Servers will have their .spec.power set to Off. The respective ServerClaims, if any, can either be re-bound or kept. Likely, such a setting needs to be added to a ServerClaim. On completion all selected Servers will have .spec.power to On, when no other ServerMaintenance with stage InMaintenance selects a Server. Completed ServerMaintenances need to be garbage collected eventually.

The cloud-provider-manager-metal can have a watch on ServerMaintenances and forward information about ServerMaintenances into it's Kubernetes cluster. Do we need some mechanism to block/deny maintenances on Servers?

aobort commented 1 week ago

@Nuckal777 thanks for input!

I'm pretty sure, that we must not power off servers in such an implicit manner. For instance, firmware updates, especially which require server reboot, should be also considered as maintenance. Hence, it might not be reasonable to force server power off.

Apart from that, any of controllers which work with Server CR would have to loop over ServerMaintenance objects to find whether server is in maintenance or not. This could be solved by the additional field or condition in servers status, however, it might cause data race between controllers as soon as both spec (to switch power) and status (to set maintenance state) have to be updated.

From my perspective, maintenance state should be represented in .spec of the Server CR:

Nuckal777 commented 1 week ago

From my perspective, maintenance state should be represented in .spec of the Server CR

There is an issue with that design: When 2 controllers powered off a Server and one controller is finished, the finished controller will set the power state to on and the second controller to off. The controller start fighting over that field. metal3 uses a similar design for that reason. Further reasoning is in the declarative node API proposal.

For instance, firmware updates, especially which require server reboot, should be also considered as maintenance.

Why couldn't a the firmware upgrade handling create a ServerMaintenance and delete it after finishing?

Apart from that, any of controllers which work with Server CR would have to loop over ServerMaintenance objects to find whether server is in maintenance or not.

I agree. This would be a disadvantage.

afritzler commented 1 week ago

I like the idea of having an own resource initiating the Maintenance of a Server. Regarding the ServerClaim binding: Power management and other updates will only happen, when the Server is in a Reserved state - meaning once a Server is in Maintenance, only the MaintenanceReconciler should perform operations on the Server e.g. creating a BootConfiguration to boot into the BIOS/Firmware update mode etc.

aobort commented 1 week ago

There is an issue with that design: When 2 controllers powered off a Server and one controller is finished, the finished controller will set the power state to on and the second controller to off. The controller start fighting over that field. metal3 uses a similar design for that reason. Further reasoning is in the declarative node API proposal.

As @afritzler mentioned, it should be done in such a way, that particular controller may manage server's power state only if server is in particular state. Like if server is in Reserved state, then only ServerClaim controller could power it on/off, if server is in Maintenance state, then only Maintenance controller could power it on/off and etc.

Why couldn't a the firmware upgrade handling create a ServerMaintenance and delete it after finishing?

It definitely could, but I was pointing that is not necessarily need to power off the server in maintenance state.