Open chrisdoherty4 opened 1 year ago
Linked issue: https://github.com/tinkerbell/boots/issues/178
Linked discussion: https://github.com/tinkerbell/tink/discussions/522
Leaving this in discussion until it is broken down a bit more. (for example: auto network booted vs auto provisioned)
In the last discussion we concluded it would be useful to break this feature into 2: (1) Auto enrollment of hardware in the Tinkerbell stack and (2) Running default workflows. We'll tweak the summary of this roadmap item for (1) and have a separate roadmap item for (2).
Very nice, I would like to participate in those discussions and even do some PR's. This is definitively something great to have, as working in a bigger scale of machines, auto-enrollment and workers being able to become "discoverable" (maybe adding this to some config files, like discoverable:true
).
Based on today's meeting, I suggest we go by two routes and start to "design" something more touchable.
Node auto-enrollment and node "sniffing" - challenges and opportunities 1.1 - Identify which services can play a part in the band :
hardware.yml
for default workflow execution and even as a skeleton for customized deployments based on some informations of specific nodes (based on MAC addresses, GPU/CPU profiling, memory sets like size or access type like NUMA etc)Design a Request for Enhancement (RFE) proposal and map the impacts on actual code-base and project as a whole
This would be the last part and after the previous one, based on that, we need to do :
2.1 - Design the features and implement small pieces along with quick experiments
2.2 - Collect data and thoughts on how components are interacting and side-effects along with trade-offs
2.3 - Decide to go for a alpha
and beta
version of the whole stack with everything in place
2.4 - Ask for feedback from the community and look to use-cases to demonstrate how Tinkerbell is behaving in real-world scenarios deployments
Anything else to be thrown here ?
Unsure if this is the forum for providing community feedback/use-cases, or if that should saved for the RFE discussion, but we have a few distinct use cases where hardware auto-discovery/having a default workflow would be super helpful.
One of the things that might be difficult to reconcile is whether you've already "discovered" a piece of hardware before. I.e. Does one need 100% match between existing hardware profile and 'new' hardware profile for them to be 'the same' (and another hardware profile/object is not created)? What happens if its a match except for one PCI-E card being removed (update the old hardware object or create a new one)? Just some things to think about.
Hi @mddeff. This is definitely the right place to provide feedback, so thank you!
Overview
There have been various requests to auto enroll devices with some sort of MAC filtering. Auto enrollment could mean bringing a device online ready to process workflows, or it could mean defining a default workflow to be run on all devices that auto enroll.
It may be useful to think of running a default workflow as an independently configurable feature from auto enrolling a device. This would help define auto enrollment as simply bringing a Tink Worker online on said device and subsequently allow operators to manually define workflows as well as define an automated approach.