kubernetes / cloud-provider-openstack

Apache License 2.0
614 stars 599 forks source link

[cinder-csi-plugin] `ignore-volume-az` appears to be either useless or incorrectly documented #2185

Open stephenfin opened 1 year ago

stephenfin commented 1 year ago

/kind bug

(though this is potentially more question than bug)

What happened:

The ignore-volume-az option is documented like so:

ignore-volume-az Optional. When Topology feature enabled, by default, PV volume node affinity is populated with volume accessible topology, which is volume AZ. But, some of the openstack users do not have compute zones named exactly the same as volume zones. This might cause pods to go in pending state as no nodes available in volume AZ. Enabling ignore-volume-az=true, ignores volumeAZ and schedules on any of the available node AZ. Default false. Check cross_az_attach in nova configuration for further information.

Thought it is implied here, it isn't explicitly stated that this only affects the created PersistentVolume. When this is set to false, each PV will have a spec.nodeAffinity field like so:

$ kubectl get pv foo -o yaml
...
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.cinder.csi.openstack.org/zone
          operator: In
          values:
          - nova

By setting it to true, we do not populate this field. However, this does not affect the CSINode or Node CRDs. When the Topology feature is enabled, each CSINode will have spec.drivers[cinder.csi.openstack.org].topologyKeys value populated with [topology.cinder.csi.openstack.org/zone], while the corresponding Node will have a topology.cinder.csi.openstack.org/zone: $az metadata label. $az here is the Compute availability zone retrieved from the OpenStack instance metadata service, which kubelet in turn retrieves via the NodeGetInfo gRPC API. The value $az is used when making a request for a volume to Cinder unless overridden by the availability parameter on the StorageClass. All this means that if there is a mismatch between Nova and Cinder AZs, setting ignore-volume-az won't help since the volume request sent to Cinder will use an invalid (to Cinder) AZ.

Given the above, I can only think of one use case where the option would be useful: where you had set the availability parameter on your storage class(es), allowing you to e.g. have a storage class per AZ. In this case, you would want to ensure the AZ requested for the Cinder volume was not transformed into a nodeAffinity for the resulting PVs. However, if this is what you're doing, why are you even enabling the Topology feature in the first place? Disabling this won't change the request to Cinder if this value is provided. As such, I don't know who this config option is intended for and neither #1735 nor #1739 help me.

What you expected to happen:

I'm not sure :smile:

How to reproduce it:

  1. Deploy OpenStack using different Cinder and Nova AZs.
  2. Enable the Topology feature for Cinder CSI and set [BlockStorage] ignore-volume-az = True
  3. Attempt to create a Pod with a PVC attached.

This will fail unless [DEFAULT] allow_availability_zone_fallback is set to true in cinder.conf. Even in this instance, we're still requesting invalid data from cinder: cinder is just able to ignore it and fallback to [DEFAULT] default_availability_zone (while the PVC won't have a nodeAffinity value corresponding to the real AZ used by Cinder because of [BlockStorage] ignore-volume-az = True

Anything else we need to know?:

N/A

Environment:

jichenjc commented 1 year ago

@stephenfin I think the orignial doc drafted by @ramineni , not sure he still work on this do you want to do more test in order to update the doc or do some design change?

stephenfin commented 1 year ago

I pretty sure my evaluation of this is correct as my limited testing to date has proved this. Assuming so, I'm still debating whether we update the documentation to highlight the one potential use case I outlined above, or if we just deprecate this outright. I was hoping for some guidance on that from people that actually use this feature (assuming anyone does). Perhaps I should bring this up for discussion in a future meeting? In the interim, I will keep testing this to see what happens.

stephenfin commented 1 year ago
Assuming a simple deployment with one nova AZ, nova, and one cinder AZ, cinder and the Topology feature enabled for the Cinder CSI driver we see the following behaviour. # CSI ignore-volume-az value Cinder allow_availability_zone_fallback value StorageClass availability parameter value Cinder volume is created? PV has node affinity? Pod schedules?
1 True True <unset>
2 True True cinder
3 True False <unset> N/A
4 True False cinder
5 False True <unset>
6 False True cinder
7 False False <unset> N/A
8 False False cinder N/A

Scenario 1 passes because the invalid AZ requested (nova, which was retrieved from instance metadata) was transformed to a valid AZ (cinder) by Cinder because [DEFAULT] allow_availability_zone_fallback = True. When this is set to False in scenario 3, Cinder will fail with an internal error like so:

2023-03-23 12:45:49.513 18 WARNING cinder.volume.api [req-df5acdf9-f3e7-444e-a9c1-a49f6c8af219 97df6e75a0424e80a7ee5429e3b09ebc 98038354bc054973a4ff2bc202916a55 - default default] Task 'cinder.volume.flows.api.create_volume.ExtractVolumeRequestTask;volume:create' (cbc5f408-8b56-436e-9b71-910966068bdc) transitioned into state 'FAILURE' from state 'RUNNING'
1 predecessors (most recent first):
  Flow 'volume_create_api': cinder.exception.InvalidAvailabilityZone: Availability zone 'nova' is invalid.
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api Traceback (most recent call last):
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api   File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api     result = task.execute(**arguments)
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api   File "/usr/lib/python3.6/site-packages/cinder/volume/flows/api/create_volume.py", line 422, in execute
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api     volume_type=volume_type)
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api   File "/usr/lib/python3.6/site-packages/cinder/volume/flows/api/create_volume.py", line 293, in _extract_availability_zones
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api     raise exception.InvalidAvailabilityZone(az=availability_zone)
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api cinder.exception.InvalidAvailabilityZone: Availability zone 'nova' is invalid.
2023-03-23 12:45:49.513 18 ERROR cinder.volume.api

Scenario 2 and scenario 4 both pass because we manually set a valid availability zone, cinder. While this is less of an issue for scenario 2 (if it was invalid, Cinder could transform it due to [DEFAULT] allow_availability_zone_fallback = True) it's necessary for scenario 4. In these cases, as well as in scenario 1, setting [BlockStorage] ignore-volume-az is necessary because a failure to do so will result in node affinity being set on the resulting PV. Because that node affinity value would be the value of a valid cinder AZ (cinder, in this case), it would never match with node affinity of actual nodes, which is retrieved from instance metadata and is the value of a valid nova AZ (nova, in this case).

The only thing I've yet to test is what happens when the Topology feature is disabled but a user manually requests an availability zone via the availability parameter. In this case does the resulting PV have a node affinity field or is that entire mechanism disabled by disabling the Topology feature. If disabling the Topology feature also disables node affinity on created PVs then I think we should deprecate and remove this config option in favour of simply enabling/disabling the Topology feature. If disabling the feature does not disable node affinity then we should keep it but clarify the documentation to emphasise the specific nature of this option. I'll revert back when I've figured this out.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mdbooth commented 1 year ago

@stephenfin I'm guessing we still have something to do here?

/remove-lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

dulek commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

stephenfin commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

stephenfin commented 1 month ago

/remove-lifecycle stale