Closed pokearu closed 2 years ago
Upon consideration of the initial ideas, PBNJ as a k8s controller is the approach I wish to elaborate and push forward.
In this approach we convert PBNJ into a k8s controller, that reconciles to perform desired PBNJ power/boot management actions.
We require an initial update to the tinkerbell hardware CRD. The idea here is that the Hardware CR would have a reference to its corresponding BMC object.
type HardwareSpec struct {
...
BmcRef BmcReference `json:"bmcRef,omitempty"`
}
The PBNJ controller would be responsible for reconciling and maintaining the desired state of the BMC object on the cluster. The BMC object contains the required bmc information like host IP, vendor, etc. Along with the desired state of the BMC like Power, Boot preference, NTP etc.
In addition to maintaining the desired state of the BMC, pbnj controller can perform a desired set of actions, as a one off job. The job may include tasks like Power Off -> Set one-time Net boot -> Power On -> Set persistent Disk boot
. Once the job is complete, the controller brings the machines back to their desired state. This gives the clients the flexibility to power cycle or reset nodes for updates/maintenance.
In this approach, the client to the pbnj controller can either be an end user, who does kubectl apply
of the BMC object to set the desired state for all BMC in a data center. Or automation like CAPT can create the necessary objects to get nodes to the desired power state for provisioning.
@jacobweinstock This probably deserves some labeling given we're pushing ahead.
Note the implementation isn't landing in pbnj, it'll be in its own repository. Currently thats the rufio
repository but it may get renamed. This issue is probably worth leaving open until that work is complete just for tracking and linking purposes.
@chrisdoherty4 would you consider closing this now that https://github.com/tinkerbell/rufio has come along a bit further? Either way, perhaps offer a diff of the points raised in @pokearu's two comments that define the goals.
When running with a Kube back-end I'm not sure PBnJ makes sense because all the interactions its required for are handled by Rufio.
If we want to talk about changing to use Kubernetes back-end as the primary/only back-end then I suspect Rufio would only need integrating if users want to talk BMC with a request-response type API. This feels like a bigger discussion than this ticket and other issues in the Tinkerbell space have a similar commentary - lets chat at a community meeting.
Closing this as github.com/tinkerbell/rufio is provides a Kubernetes based BMC service.
Currently PBNJ is a standalone service that performs power management operations. It would benefit to have a formal integration with the Tinkerbell stack with the changes for k8s resource model.
Expected Behaviour
When provisioning baremetal nodes using Tinkerbell, the pbnj component would be responsible for the power/boot management of the nodes. The
hardware CRD
can be extended to contain the necessary BMC information, that pbnj may leverage to perform actions. This would help power on nodes, create BMC users and setting boot options. Also opens a scope to deprovision nodes, perform reboots/resets etc.Current Behaviour
Manual intervention is required for powering up baremetal nodes and setting the boot order to net boot for Tinkerbell provisioning.
Initial Ideas
These are some rough ideas that can be discussed and expanded to a more formal proposal.
PBNJ as k8s Service
Currently PBNJ is a GRPC service, this can be run on the k8s cluster along with all the other Tinkerbell components (Boots, Hegel). The PBNJ service would have read access to the
Hardware CRDs
to fetch the BMC information and perform actions.PBNJ as a k8s Controller
PBNJ can be redesigned to be a k8s controller. The controller could watch
Workflow CRDs
and pickuptasks
tagged to it and perform power management actions.PBNJ as a Hub action
This idea is based off tink-worker, we could possibly have a long running
pbnj-worker
on the same cluster as the Tinkerbell stack. Thepbnj-worker
could run hub actions, which use PBNJ binary to perform power management tasks.