Auto enrollment of nodes

chrisdoherty4 commented 1 year ago

Overview

There have been various requests to auto enroll devices with some sort of MAC filtering. Auto enrollment could mean bringing a device online ready to process workflows, or it could mean defining a default workflow to be run on all devices that auto enroll.

It may be useful to think of running a default workflow as an independently configurable feature from auto enrolling a device. This would help define auto enrollment as simply bringing a Tink Worker online on said device and subsequently allow operators to manually define workflows as well as define an automated approach.

jacobweinstock commented 1 year ago

Linked issue: https://github.com/tinkerbell/boots/issues/178

jacobweinstock commented 1 year ago

Linked discussion: https://github.com/tinkerbell/tink/discussions/522

jacobweinstock commented 1 year ago

Leaving this in discussion until it is broken down a bit more. (for example: auto network booted vs auto provisioned)

chrisdoherty4 commented 1 year ago

In the last discussion we concluded it would be useful to break this feature into 2: (1) Auto enrollment of hardware in the Tinkerbell stack and (2) Running default workflows. We'll tweak the summary of this roadmap item for (1) and have a separate roadmap item for (2).

pedroalvesbatista commented 1 year ago

Very nice, I would like to participate in those discussions and even do some PR's. This is definitively something great to have, as working in a bigger scale of machines, auto-enrollment and workers being able to become "discoverable" (maybe adding this to some config files, like discoverable:true).

pedroalvesbatista commented 1 year ago

Based on today's meeting, I suggest we go by two routes and start to "design" something more touchable.

Node auto-enrollment and node "sniffing" - challenges and opportunities 1.1 - Identify which services can play a part in the band :
- Rufio could use some entries from BMC's after Hook boots up and fetchs a bunch of node HW information
- Hegel could retain the previous HW information
- Boots can read the entries from Hegel and provide them all to Provisioner
- Perhaps design and implement a new service (and suggest a name) to collect HW metadata, generate a hardware.yml for default workflow execution and even as a skeleton for customized deployments based on some informations of specific nodes (based on MAC addresses, GPU/CPU profiling, memory sets like size or access type like NUMA etc)
Design a Request for Enhancement (RFE) proposal and map the impacts on actual code-base and project as a whole

This would be the last part and after the previous one, based on that, we need to do : 2.1 - Design the features and implement small pieces along with quick experiments 2.2 - Collect data and thoughts on how components are interacting and side-effects along with trade-offs 2.3 - Decide to go for a alpha and beta version of the whole stack with everything in place 2.4 - Ask for feedback from the community and look to use-cases to demonstrate how Tinkerbell is behaving in real-world scenarios deployments

Anything else to be thrown here ?

mddeff commented 11 months ago

Unsure if this is the forum for providing community feedback/use-cases, or if that should saved for the RFE discussion, but we have a few distinct use cases where hardware auto-discovery/having a default workflow would be super helpful.

One of the things that might be difficult to reconcile is whether you've already "discovered" a piece of hardware before. I.e. Does one need 100% match between existing hardware profile and 'new' hardware profile for them to be 'the same' (and another hardware profile/object is not created)? What happens if its a match except for one PCI-E card being removed (update the old hardware object or create a new one)? Just some things to think about.

chrisdoherty4 commented 11 months ago

Hi @mddeff. This is definitely the right place to provide feedback, so thank you!

tinkerbell / roadmap

Auto enrollment of nodes #23

Overview