Closed eero-t closed 1 year ago
K8s admin needing to create tmpfs on every node to make sure host mounted things will get cleaned out eventually, is quite ugly.
One way to improve this would be kubelet supporting some kind of node-local / named "emptyDir" volume which could be shared between pods running the same node:
volumes:
- name: nfd-features
storageClassName: nfd-features-dir
nodeDir: {}
containers:
name: nfd-worker
volumeMounts:
- name: nfd-features
mountPath: /etc/kubernetes/node-feature-discovery/features.d/
Kubelet would create a directory somewhere when first pod/container requests it, and share it to every pod on same node declaring "nodeDir" volume with same "storageClassName". Kubelet would ref-count it, and remove it when last pod referring to it goes away.
Main advantages to what needs to be done now, i.e. sharing explicit host paths between pods, would be kubelet managing the directory access and life-cycle instead of each pod needing to do that.
This sharing is similar to how PV / PVC do cluster-wide, except that content will not be persistent, nor shared cluster-wide.
Does that sound like reasonable KEP subject?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@marquiz Any comments on this proposal?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think the "solution" is to use NodeFeature and NodeFeatureRule objects. Hooks are going away and we might want to do that for feature files, too. Patches are of course welcome if you want to fix this.
Documentation is something that would be good to improve so /help on that
@marquiz: This request has been marked as needing help from a contributor.
Please ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
I think the "solution" is to use NodeFeature and NodeFeatureRule objects. Hooks are going away and we might want to do that for feature files, too.
Are you saying that because:
NFD itself does not need additional life-cycle management for them nor "best-before" timestamps?
I think label conflict detection would still be needed.
And is there option for specifying rules re-evaluation period?
PS. NodeFeature being limited to a single node seems too constrained: https://kubernetes-sigs.github.io/node-feature-discovery/v0.12/usage/customization-guide#nodefeature-custom-resource
IMHO it should be at least a list, so that one does not need to specify separate NodeFeature file for each node.
Are you saying that because:
- Those will be real k8s objects i.e. visible on API server
- NFD re-creates all node labels from scratch whenever it re-evaluate NodeFeatureRules
NFD itself does not need additional life-cycle management for them nor "best-before" timestamps?
Yes, "label/feature lifecycle" is basically managed by kubernetes. Also, with NodeFeature objects features are gone when you delete the namespace where the object is located.
I think label conflict detection would still be needed.
I'm not so sure about this. There are cases where people want to override existing labels.
And is there option for specifying rules re-evaluation period?
There is no need for such thing as NodeFeatureRule and NodeFeature objects are immediately evaluated when they are changed.
PS. NodeFeature being limited to a single node seems too constrained: https://kubernetes-sigs.github.io/node-feature-discovery/v0.12/usage/customization-guide#nodefeature-custom-resource
Yeah, I've thought about that, supporting some sort of node groups for example but haven't come up with any particular design yet. And there hasn't been much feedback on this experimental still-disbled-by-default API.
IMHO it should be at least a list, so that one does not need to specify separate NodeFeature file for each node.
This needs to be carefully thought out. Possibly some node-group label. But then something needs to create those node group labels before NodeFeature rules are evaluated...
I think label conflict detection would still be needed.
I'm not so sure about this. There are cases where people want to override existing labels.
Shouldn't that be done by updating the existing NodeFeature/NodeFeatureRule object, rather than creating new one with a different name but specifying the same labels? How do you then define which one takes precedence? [1]
And is there option for specifying rules re-evaluation period?
There is no need for such thing as NodeFeatureRule and NodeFeature objects are immediately evaluated when they are changed.
[1] I did not mean parsing the objects, but check whether matches for their rules have changed due to system run-time changes. E.g. kernel modules and PCI devices can both be added and removed at run-time (if they are not in use).
Shouldn't that be done by updating the existing NodeFeature/NodeFeatureRule object, rather than creating new one with a different name but specifying the same labels? How do you then define which one takes precedence? [1]
Put it differently, I think it's just about ordering. We have the inid.d-style ordering where the last one prevails. Changing the ordering doesn't resolve any problems in my opinion.
There is no need for such thing as NodeFeatureRule and NodeFeature objects are immediately evaluated when they are changed.
[1] I did not mean parsing the objects, but check whether matches for their rules have changed due to system run-time changes. E.g. kernel modules and PCI devices can both be added and removed at run-time (if they are not in use).
That's handled by nfd-worker just like before. (or whatever mechanism you have in case of 3rd party extensions doing NodeFeature objects)
Ok, so NodeFeatureRules are evaluated by nfd-workers on every sleepInterval
, same as the other things. NFD Customization guide could mention this (as general rule, not NodeFeature* specific one).
Overriding label with a different object still sounds more like an error to me though.
Having multiple rules producing a more generic label or taint could be valid though.
I.e. while error about overlaps is definitely too strict, (single) warning might still be warranted:
Rule 'B' provides label 'L' already provided by rule 'A', is this overlap intentional?
Ok, so NodeFeatureRules are evaluated by nfd-workers on every
sleepInterval
, same as the other things. NFD Customization guide could mention this (as general rule, not NodeFeature* specific one).
NFD-Master is the one processing NodeFeatureRules. In practice, currently, it's like you described as nfd-worker sends the features to nfd-master over gRPC every sleep-interval. BUT, with the NodeFeature API enabled (-enable-nodefeature-api
) there is no gRPC and nfd-master only does anything if NodeFeature OR NodeFeatureRule objects change. NFD-Worker does re-discovery every sleep-interval but if there's no change the corresponding NodeFeature object is not changed and there is no action on the master side.
Overriding label with a different object still sounds more like an error to me though.
Agree, generally it really should not happen (unless you really know what you're doing and what you want and why)
Having multiple rules producing a more generic label or taint could be valid though.
I.e. while error about overlaps is definitely too strict, (single) warning might still be warranted:
Rule 'B' provides label 'L' already provided by rule 'A', is this overlap intentional?
Agree, a warning or at least some log message would probably be a good idea
@marquiz what about adding an attribute in NodeFeatureRule
that will define the behavior of the master when we have overlapped labels? e.g:
overlap: override | error
/remove-lifecycle stale
Maybe we should activate on this one. The documentation can be improved anytime by anyone (@eero-t? 😉)
Support for an expiry-date field in the feature files (as suggested in the issued description) makes sense to me. We could try to get it in v0.15
/mileston v0.15
/milestone v0.15
/assign
What would you like to be added:
Implementation and documentation for life-cycle management of external features (split from #855).
Why is this needed:
Issues / missing:
Proposed solution:
=> If pod changes the name of a feature file it installs, without changing all the label names, it must remove old feature file before installing the new one
Note: Above ignores life-cycle management for hooks, as they're going away (#856).