kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
980 stars 790 forks source link

Is it possible to use the standard k8s topology labels? #729

Closed TBBle closed 2 months ago

TBBle commented 3 years ago

Is your feature request related to a problem?/Why is this needed

Looking into the interactions of this driver with the Cluster Autoscaler, i.e., https://github.com/kubernetes/autoscaler/issues/3230#issuecomment-774387646, I have been unable to identify a good way to support the scale-from-zero case without needing to either change Cluster Autoscaler to always add this driver's topology label, or have each node group's ASG manually specify the topology label topology.ebs.csi.aws.com/zone.

/feature

Describe the solution you'd like in detail

A parameter to specify the topology label used by the PV dynamic allocation, which in the EKS case, can be set to topology.kubernetes.io/zone (or failure-domain.beta.kubernetes.io/zone on older clusters) and behave correctly.

Describe alternatives you've considered

See https://github.com/kubernetes/autoscaler/issues/3230 for more details on these alternatives.

Additional context

I'm not sure what the reason is for using a new label originally, which is why I suggest it be an option, rather than changing the default. I'm guessing in non-EKS deployments on EC2, the standard zone label may not be reliably the AWS Zone, if the AWS Cloud Controller is not in-use?

That said, if the reason is that that the label must be owned by the driver for spec or behaviour reasons, then this wouldn't be feasible at all, and we'll probably be stuck manually ensuring that the node groups with the tag correctly match the tolerations of the Daemonset.

ayberk commented 3 years ago

I really don't know why we didn't use the well-known label. @wongma7 @leakingtapan maybe?

I agree we should just use them instead. Initially we should support both labels though so we can provide a deprecation warning.

leakingtapan commented 3 years ago

It is because the well-known label wasn't well-known yet by the time this driver's topology key was set. There used to be a separate label failure-domain.beta.kubernetes.io/zone and it got renamed to topology.kubernetes.io/zone as part of the work from sig-cloudprovider. See here. And see here for when this driver's label was set.

Given the fact that the driver is not GA yet, I think it makes sense to fix the label before GA if no other concerns. But also keep in mind to update the CSI translation lib for CSI migration here to avoid breaking CSI migration.

Hope this helps

leakingtapan commented 3 years ago

related to: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/675

ayberk commented 3 years ago

That's really helpful, thanks a lot @leakingtapan!

Since this requires k/k changes, I'll try to come up with a plan.

/assign

ayberk commented 3 years ago

We'll need to gracefully retire the current label (topology key), so we need to start with writing both.

  1. Driver starts writing both keys: topology.ebs.csi.aws.com/zone and topology.kubernetes.io/zone. For topology constraints it should use the new one if available or fallback to the old one.
  2. Translation lib similarly has to read both when converting to in-tree. However, we have two options while converting from in-tree to CSI:
    1. Only write the new one. This would break the backward-compatibility and older versions of the driver wouldn't be able to read the topology key.
    2. Write both keys so we don't break the older versions. Since we're already beta, this is preferred.
      1. At this point all e2e tests should still pass.
      2. Update the e2e tests to use the new key. Unit tests should suffice for supporting the old key.
      3. Remove the old key. Eventually.

Skipping the beta labels should be safe since it's only used by the translation lib.

cc @wongma7 for sanity check

wongma7 commented 3 years ago

sounds good to me. I understand concretely this to mean

NodeGetInfo must respond such that CSINode.topologyKeys has both oldKey and newKey.

CreateVolume, if provided topology requirement in request, must extract the zone from newKey if exists or oldKey if not

CreateVolume must respond such that PVs have nodeAffininity.nodeSelector oldKey==A || newKey==A (by returning AccessibleTopology array containing both)

TBBle commented 3 years ago

If I'm following correctly, landing #773 would be sufficient for Cluster Autoscaler to operate correctly with --balancing-ignore-label=topology.ebs.csi.aws.com/zone, because #773 is ensuring that it has the same value as topology.kubernetes.io/zone. So CA can rely on the well-known label for "will this Pod be schedulable on this node-template", even though csi-translation-lib is hinging off the deprecated-but-identical value when it actually interacts with the scheduler.

It looks like #773 missed the 0.10 release branch, so I guess it'll be part of 0.11?

ayberk commented 3 years ago

You're right, #773 is enough. Remaining work is irrelevant.

I can see it in the release PR. We haven't released it but 0.10 will include it. See #811

denniswebb commented 3 years ago

We are running 0.10 and hit an issue of it complaining about not having a topology key on the node to provision a new volume from a PVC. Does #773 prevent needing the key topology.ebs.csi.aws.com/zone or is it still required for now?

ayberk commented 3 years ago

For now you should still use both. Driver itself priorities topology.kubernetes.io/zone, but it's not present falls back to topology.ebs.csi.aws.com/zone. We haven't updated the csi-translation on the upstream though, so we're still writing and reading both keys.

Do you mind opening an issue ideally with more details?

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

wongma7 commented 3 years ago

/remove-lifecycle stale

label best practices needs discussion with kubernetes upstream . It's not clear at this time whether we should be putting the standard label or if the standard label should be reserved forr use by in-tree drivers (which simplifies migration because migration translates from our custom label to the standard label.)

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k1rk commented 3 years ago

/remove-lifecycle stale

TBBle commented 2 years ago

To cross-link, it seems there's a problem with this approach. I'm not yet clear on the actual blocker there, unless there's a need/use-case for topology zones that do not match the active cloud controller's zones being used to populate the Node's topology.kubernetes.io/zone label.

Anyway, I wanted to make sure that comment was linked from here, as I had not realised from the history of this ticket that the initial change to move towards this state had been reverted.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k1rk commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc commented 1 year ago

/remove-lifecycle stale

ConnorJC3 commented 1 year ago

/close

fixed in #1360 - available in the driver since version v1.12.0

k8s-ci-robot commented 1 year ago

@ConnorJC3: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729#issuecomment-1452454486): >/close > >fixed in #1360 - available in the driver since version v1.12.0 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jbg commented 1 year ago

@ConnorJC3 I think this should be re-opened. From what I can see, the fix was reverted because it was causing duplicate topology selector terms in the CSI migration case.

So it's still the case, even with latest driver version, that provisioned PVs select only on the non-standardised topology.ebs.csi.aws.com/zone label, and don't work out of the box with cluster-autoscaler when scaling node groups up from zero. (You have to manually label the ASGs with a node template so that cluster-autoscaler knows that the first node will have the topology.ebs.csi.aws.com/zone label once it starts up.)

One solution to solve this without harming compatibility with existing clusters would be to provide an option, disabled by default, to use the standardised topology.kubernetes.io/zone label. This seems to be what is discussed in https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/962, which was closed due to no activity.

ConnorJC3 commented 1 year ago

Ah, that is correct, I will reopen this, sorry. Will look into fixing this in the near-ish future completely.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

z0rc commented 9 months ago

/remove-lifecycle stale

ConnorJC3 commented 8 months ago

/lifecycle frozen

Hi, sorry about the delay on this, the team has decided on a plan:

We do want to move over to the standard labels, but don't want to immediately do so as it breaks downgrading and is a minor breaking change. Thus, the plan is:

  1. Immediately change the node to report both our non-standard label as well as the standard label. This enables forwards compatibility for a future release when the topology on the volume is switched over (being worked on in https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/1931)
  2. Announce the intent to change the topology label for created volumes to the standard label in a future release in the next release of the EBS CSI Driver (currently planned for mid/late February)
  3. In some future release (exact timeline TBD, but likely 2-3 monthly releases after the announcement), switch the driver over to report the standard label for all created volumes instead of the non-standard label. After upgrading to this version, users will only be able to downgrade to the February release or later (or else their volumes will not be able to schedule).
lgfa29 commented 7 months ago

Hi @ConnorJC3 👋

I am engineer on the Nomad team and I think #1931 may have created an irreconcilable topology.

With v1.28.0, the created volume only has the topology segment "topology.ebs.csi.aws.com/zone": "eu-west-2a", while the node has both domains in the same segment. Since segments are compared with AND it means that volumes are never able to be scheduled.

On step 3, if the volumes only receive the new label, would the non-standard label be removed from node as well? Otherwise the irreconcilable topology would still be a problem.

Apologies for my ignorance on this, but, currently, where is the standard label applied the the volume on creation? Otherwise I would imagine the same problem would be happening on Kubernetes clusters?

Thank you!

ConnorJC3 commented 7 months ago

Hi @lgfa29 - My understanding of the CSI spec is that a volume with key topology.ebs.csi.aws.com/zone should match a node with keys topology.ebs.csi.aws.com/zone and topology.kubernetes.io/zone (as long as the value for topology.ebs.csi.aws.com/zone matches). In other words, essentially extra key(s) on the node should not be considered a mismatch.

This is also how it is implemented in Kubernetes: https://github.com/kubernetes/kubernetes/blob/909faa3a9b33161992af0684c5582353b5be6d9b/pkg/scheduler/framework/plugins/volumezone/volume_zone.go#L187-L193

// ... The requirement is that any
// volume zone-labels must match the equivalent zone-labels on the node.  It is OK for
// the node to have more zone-label constraints (for example, a hypothetical replicated
// volume might allow region-wide access)

On step 3, if the volumes only receive the new label, would the non-standard label be removed from node as well? Otherwise the irreconcilable topology would still be a problem.

The intent is to serve both labels from the node indefinitely, to support volumes created with the old topology key to continue to work on new versions of the EBS CSI Driver.

Apologies for my ignorance on this, but, currently, where is the standard label applied the the volume on creation? Otherwise

Currently volumes receive topology.ebs.csi.aws.com/zone upon creation. We intend at some future date to switch this to the now standard label topology.kubernetes.io/zone.

lgfa29 commented 7 months ago

Thank you for the extra information @ConnorJC3.

I've opened https://github.com/hashicorp/nomad/issues/20094 in the Nomad repo for us to track this.

ConnorJC3 commented 2 months ago

/close

As of v1.33.0 of the EBS CSI Driver, the driver creates volumes with the AZ topology label topology.kubernetes.io/zone. As a result, the three step plan from https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729#issuecomment-1942026577 is completed. See the CHANGELOG for more info.

k8s-ci-robot commented 2 months ago

@ConnorJC3: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729#issuecomment-2271342381): >/close > >As of `v1.33.0` of the EBS CSI Driver, the driver creates volumes with the AZ topology label `topology.kubernetes.io/zone`. As a result, the three step plan from https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729#issuecomment-1942026577 is completed. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.