hpe-storage / csi-driver

A Container Storage Interface (CSI) driver from HPE
https://scod.hpedev.io
Apache License 2.0
55 stars 53 forks source link

Add basic topology support #388

Closed megakid closed 2 months ago

megakid commented 4 months ago

These changes add basic topology support to HPE CSI Driver.

They expose a new plugin capability: PluginCapability_Service_VOLUME_ACCESSIBILITY_CONSTRAINTS

This allows you to attach allowedTopologies to a StorageClass and it will handle the rest - including ensuring that Kubernetes will schedule your pods onto the correct nodes etc.

The benefit here is you can avoid setting pod affinities etc and just rely on the storage class choice to schedule your nodes correctly. We have a cross-datacenter cluster with access to two HPE devices, one per DC.

Note: this does not add full StatefulSet automatic provisioning across multiple HPE devices which is a bigger task.

I should add that to make this work, it also needs Topology feature gate added to the csi-provisioner container args:

            {{- if .Values.csiDriverTopologyEnabled }}
            - "--feature-gates=Topology=true"
            {{- end }}

e.g.

A storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: hpe-csi-dc1
provisioner: csi.hpe.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: csi.hpe.com/zone
    values:
    - dc1
parameters: 
  ...

A node label:

apiVersion: v1
kind: Node
metadata:
  name: dc1-general-001
  labels:
    csi.hpe.com/zone: dc1
    ...
spec:
  ...

After this is setup, you get the topologyKeys set on the CSINode automatically:

apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
  name: dc1-general-001
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: dc1-general-001
    uid: ...
spec:
  drivers:
  - name: smb.csi.k8s.io
    nodeID: dc1-general-001
    topologyKeys: null
  - name: file.csi.azure.com
    nodeID: dc1-general-001
    topologyKeys: null
  - allocatable:
      count: 100
    name: csi.hpe.com
    nodeID: ...
    topologyKeys:
    - csi.hpe.com/zone
datamattsson commented 4 months ago

Hi @megakid! Thank you for your contribution of this much sought after feature! đŸ¥‡

It will take some time for the team to test and review this PR. If everything checks out we're going to have you sign the PR as per the DCO failure. https://github.com/hpe-storage/csi-driver/pull/388/checks?check_run_id=22447297357

The next release of the CSI driver is most likely 4-6 months and we have our hands full right now with customer facing features, so do don't get discouraged by the PR sitting here for a while.

megakid commented 4 months ago

Yep no worries - worth noting that we've been running this fork utilising the topology features in our test clusters for about 5-6 months now - we've mounted over 100k clones in that time!

datamattsson commented 3 months ago

Ok, I've spent some time on this and have a couple of questions. I'm delighted to see your branch builds and pass both manual testing and the CSI e2e tests so hopefully we can get this merged sooner rather than later.

  1. The different CSI topology implementations out there try mimic the public cloud providers "region" and "zone". Is there any reason we can't follow this pattern? It might matter less in this implementation as the StorageClass gets pegged to a single array regardless and may be grouped accordingly within the single zone of the array and the nodes.
  2. You mention:

this does not add full StatefulSet automatic provisioning across multiple HPE devices which is a bigger task

I'm curious to if you have given any thought to how this can be implemented? I have my own vague ideas of a "topology aware secret" or a "meta CSP" that acts like a proxy. I'd love to hear your thoughts/opinion on how this would ideally be designed and implemented.

We also require the PR to pass the DCO and have the commit signed.

datamattsson commented 2 months ago

Just checking in here, did you get our request to sign the DCO and review the comment above? @megakid

megakid commented 2 months ago

Hi @datamattsson - thanks for reviewing, I think I've fixed up all the items you have flagged.

In terms of thoughts behind more full topology support:

I think adding a top level to the secret structure (e.g. 1 key per device) is probably the best way, adding a topology label selector to each object. The driver could read this single secret (as it does now) and the top level key names would indicate the name of the devices. This would mean keeping the CSP svc simple 1-to-1 with devices (as it is now).

Region/zone can be layered, ideally we can just use some yaml-defined match label expressions to define these in a flexible way but I think there probably needs to be some hardcoded 2 or 3 tier of labels (or at least ones that rarely change) given they need to be copied from Node -> CSINode by the driver.

You can then have a single StorageClass that refers to this secret and allows multi-device deployments such as StatefulSets (they would node anti-affinitize on csi.hpe.com/zone - unsure if this is assumed behaviour across k8s/other CSI drivers or needs to be specified in the StatefulSet spec). I suspect you might want to have a default device specified either in the StorageClass spec or as an extra property within each element of the Secret (as below)

Current

apiVersion: v1
kind: Secret
metadata:
  name: hpe-backend
  namespace: hpe-storage
stringData:
  serviceName: alletrastoragemp-csp-svc
  servicePort: "8080"
  backend: 10.10.0.20
  username: 3paradm
  password: 3pardata

Proposed

apiVersion: v1
kind: Secret
metadata:
  name: hpe-backend
  namespace: hpe-storage
stringData:
  dc1Device:
    serviceName: alletrastoragemp-csp-dc1-svc
    servicePort: "8080"
    backend: 10.10.0.20
    username: 3paradm
    password: 3pardata
    defaultDevice: true
    topologySelector:
      matchLabelExpressions:
      - key: csi.hpe.com/zone
        values:
        - dc1
  dc2Device:
    serviceName: alletrastoragemp-csp-dc2-svc
    servicePort: "8080"
    backend: 10.20.0.20
    username: 3paradm
    password: 3pardata
    topologySelector:
      matchLabelExpressions:
      - key: csi.hpe.com/zone
        values:
        - dc2
datamattsson commented 2 months ago

I think adding a top level to the secret structure (e.g. 1 key per device) is probably the best way, adding a topology label selector to each object. The driver could read this single secret (as it does now) and the top level key names would indicate the name of the devices. This would mean keeping the CSP svc simple 1-to-1 with devices (as it is now).

Thanks for your thoughts on this. I like it. Although, I think Secrets and ConfigMaps only allow key/value pairs so I think we need to parse the Secret differently in the driver.

apiVersion: v1
kind: Secret
metadata:
  name: hpe-backend
stringData:
  topology: |
    dc1Device:
      serviceName: alletrastoragemp-csp-dc1-svc
      servicePort: "8080"
      backend: 10.10.0.20
      username: 3paradm
      password: 3pardata
      defaultDevice: true
      topologySelector:
        matchLabelExpressions:
        - key: csi.hpe.com/zone
          values:
          - dc1
    dc2Device:
      serviceName: alletrastoragemp-csp-dc2-svc
      servicePort: "8080"
      backend: 10.20.0.20
      username: 3paradm
      password: 3pardata
      topologySelector:
        matchLabelExpressions:
        - key: csi.hpe.com/zone
          values:
          - dc2

volumeBindingMode: WaitForFirstConsumer will allow the CSI driver to filter out a suitable backend in the request and somehow the "device" key needs to be tacked on the PV so subsequent operations to the PV can be mapped in the Secret.

I've seen other vendors have CRs for their backend definitions and that would surely be a more clean approach.

datamattsson commented 2 months ago

@dileepds and @sijeesh could you have a look?