Closed yevgeny-shnaidman closed 2 months ago
/cc @marquiz @ArangoGutierrez
/cc @chr15p
Thanks @yevgeny-shnaidman for this. The plan sounds good to me 👍 I would've probably resorted to e.g. ansible-based operator had I been doing this overhaul myself 😅 But you know operators and e.g. the OpenShift requirements so much better than me (still an operator n00b).
Regarding option 1 vs option 2: are you still talking about one NFD CRD (not one CRD per daemon)? I don't think I have much opinion on which would be preferred. Whatever makes sense from the implementation pov, I'd choose...
Thoughts @ArangoGutierrez?
@marquiz we are not changing/adding any APIs (CRDs) , just refactoring the code. a controller or controllers will watch for NFD CR and will extract from it the parts that are relevant for their specific components
/lgtm
@marquiz @ArangoGutierrez we are gonna starting coding next week, we are going to work on the master branch. Do you have any reservations regarding it? It means that for a number of week the master branch will not be deployable
Cool @yevgeny-shnaidman. I don't think there are any reservations on that.
But what do you mean by not deployable? The master branch should remain buildable (and staging images available) all the time.
@marquiz i thought of working by submitting small PRs each time: controller/reconciler loop, then worker handling, deployment handling, topologyupdate etc'. Then taking care of deployment and bundling. So, there is going to be a period where the operator is not operational
I think that's fine. I don 't think anyone's really using the staging/devel version of the operator
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@yevgeny-shnaidman any news on this issue, new PRs to expect in the near future?
@marquiz , me and Chris were on PTO, we will be back next week. Most of the PRs are in, we will start testing in, and the subsequent PRs will include the necessary fixes
@marquiz, @ArangoGutierrez this task can be considered completed. But in order to keep the operator in sync with the operand development, i think we need to start talking about unifying the operator and operand repos
/lgmt /close
@ArangoGutierrez: Closing this issue.
Currently the NFD operator is implemented using the outdated operator format, relaying on the hard-coded yamls files, configured in the build/assets directory. Those directories include, besides Deployments and Daemonsets definitions also service-accounts/roles/roles/bindings definitions.Reconciliation loop is reading all those files, updates the data with input from NFD CR and creates/updates the resources in the k8s cluster.
We want to move to a more compliant operator, and those are the changes that we may apply: 1.Remove build/assets directory. All the k8s objects will be created on-the-fly by the appropriate reconciller/s with the default values of the structures updated with values taken from NFD CR
We have 2 option how to reconcile those objects:
Option 1
We define a controller per component (worker, master, topology). Each controller will watch for NFD CR and the appropriate k8s object (daemonset, deployment, configmap) and will run a reconciliation loop whenever it detects a change.
Option 2
We define one controller that will watch for NFD CR, daemonsets/deployments and configmaps. It will run a reconciliation on any change. It will call a dedicated package to reconcile each one of the components.
CR Status
We can remove removing reporting conditions on any of the RBAC features/SA, since those will be created during operator deployment. We will only report on Daemonsets/Deployment creation/patching. No need to report on configmaps