Closed huww98 closed 1 month ago
Welcome @huww98!
It looks like this is your first PR to kubernetes-csi/external-provisioner 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-csi/external-provisioner has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @huww98. Thanks for your PR.
I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
/ok-to-test
/lgtm
/assign @pohly PTAL. Can you add your approval?
@divyenpatel, @msau42, @jsafrane: do you remember why the flag was left at default false in https://github.com/kubernetes-csi/external-provisioner/pull/516?
Is it because that would have changed the default behavior and that wasn't desirable? If that's the reason, are we stuck with "false" forever?
It's usually not a good idea to use feature gates to control optional behavior, because sooner or later one arrives at a situation like this where the feature gets promoted and (ideally) the entire feature gate gets removed. But if that is what was done, then there's not much that can be done about it without a breaking change.
Is it because that would have changed the default behavior and that wasn't desirable?
@pohly I think it is not the case. Even this feature gate is enabled, the relevant behaviors are only enabled if the driver declares support. And if the driver supports topology, I think it is always desirable to enable this, or we are not CSI compliant, and the driver may not work properly.
So, this PR only changes the behavior for those who use driver that support topology, but do not enable this feature gate. Do such use-case exist for any reason?
I believe the reason why the gate was there was to control rollout of a driver that wants to upgrade itself to add topology support.
Because if topology was enabled in the controller but the driver has not been upgraded on the nodes to start reporting topology, then the provisioner would start failing to provision (and schedule pods). So the expected flow is:
I agree that for a permanent flag it is better to make it a config flag rather than a feature gate.
@msau42 I think even after we remove the Topology
feature gate, we can still update driver with the same flow, because this feature is still gated by VOLUME_ACCESSIBILITY_CONSTRAINTS plugin capability, and this can be used to control the rollout. And the old version of driver should not report this capability, so updating all nodes before controller should work naturally.
We're about to release a new provisioner. I personally like this PR - it goes the right direction (feature gate removal), yet it still allows users to override the feature gate to false
for another release or two.
I also like the idea of no configuration of topology in the provisioner binary, a CSI driver should report correct VOLUME_ACCESSIBILITY_CONSTRAINTS capability instead. Especially, it's the driver task to report the capability on nodes first and then in the controller if it wants to enable topology in a deterministic way.
@huww98 can you please edit the release note to emphasize that Topology
feature gate is deprecated and will be removed in a future release?
@msau42 what do you think? (BTW, would change of the default value or removal require bump of the major version? IMO we should not require it for feature gates.)
I am ok with the approach that @huww98 outlined. We need to document it and probably in the release notes we should call out action required. I think we should make it a major bump because of the action required. But normally if turing on a feature by default doesn't require action then it doens't need to be a major bump.
@msau42 to confirm, does this match what you think?
Action Required: Users who deploy controller that is reporting topology, but do not have corresponding node plugin version deployed should upgrade the node plugin before upgrading the controller.
I think the action required should be to explicitly disable Topology feature gate if corresponding CSI driver incorrectly reports VOLUME_ACCESSIBILITY_CONSTRAINTS capability. And warn them that the feature gate will be removed in a future release.
We should document probably in README.md that if a driver wants to enable topology support while it was disabled before, they should update node DaemonSet first to report VOLUME_ACCESSIBILITY_CONSTRAINTS
capability, populate Nodes with topology labels and then update the controller to report VOLUME_ACCESSIBILITY_CONSTRAINTS
capability.
I edited the release note /lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: huww98, jsafrane
The full list of commands accepted by this bot can be found here.
The pull request process is described here
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This feature gate has already existed for 5 years. And should be safe to enable by default.
Besides, because this feature gate has been marked as GA, it will show warnings:
Setting GA feature gate Topology=true. It will be removed in a future release.
. So I went ahead and removed this feature gate from command line. Then something breaks, which is very confusing. I would expect a GA feature gate to default to true.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: