kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
987 stars 792 forks source link

Include "kubernetes.io/os" in topology key #1946

Closed davidgiga1993 closed 8 months ago

davidgiga1993 commented 8 months ago

Is your feature request related to a problem?/Why is this needed We're operating a k8s cluster with linux and windows nodes. Currently it's not possible to limit a storage class using the allowed_topologies to a certain operating system.

This is required as a safety measure as currently when mounting a pvc that was previously used by a linux container on a windows container (by accident for example) it will corrupt the filesystem.

/feature

Describe the solution you'd like in detail The kubernetes.io/os item should be added to the topology key (see https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go#L593)

Describe alternatives you've considered None

I'm happy to contribute this feature but first wanted to clarify it would be in your interest as well.

torredil commented 8 months ago

Hi @davidgiga1993 thanks for this request - kubernetes.io/os is already reliably populated by Kubelet on the node: https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetes-io-os

kubectl describe node | grep "kubernetes.io/os"               

beta.kubernetes.io/os=linux
kubernetes.io/os=linux
beta.kubernetes.io/os=linux
kubernetes.io/os=linux

there isn't much value in having an additional component (the driver) also manage this label, which is all that adding the label to the topolgy segments in the csi node plugin would do.

davidgiga1993 commented 8 months ago

Thanks for your reply. I know that this label gets added to the node by the kubelet, however it's not available as topology key which means it's not possible to restrict a storage class to a certain OS. Example:

failed to provision volume with StorageClass "encrypted-standard": error generating accessibility requirements:
topologymap[topology.ebs.csi.aws.com/zone:eu-central-1a topology.kubernetes.io/zone:eu-central-1a] from selected node "ip-100-2-0-184.eu-central-1.compute.internal" is not in requisite: [map[kubernetes.io/os:linux]]   
davidgiga1993 commented 6 months ago

I finally found time to properly evaluate this change, but sadly it doesn't fully work as intended.

Good things first:

Bad things: Once a PV got created it only has the 2 default zone/region topology keys in the nodeAffinity field. For example:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/migrated-to: ebs.csi.aws.com
    pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/ebs-csi-aws-com
  labels:
    topology.kubernetes.io/region: eu-central-1
    topology.kubernetes.io/zone: eu-central-1a
  name: pvc-2af956c3-63bb-4a2d-9f4f-5b4de8bb9d02
spec:
  accessModes:
  - ReadWriteOnce
  awsElasticBlockStore:
    fsType: ext4
    volumeID: vol-036057b48f46d3109
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: win-only
    namespace: david-test
  nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In   
            values:
            - eu-central-1a
          - key: topology.kubernetes.io/region
            operator: In
            values:
            - eu-central-1
  persistentVolumeReclaimPolicy: Delete
  storageClassName: encrypted-windows
  volumeMode: Filesystem

This means that once a PV got created it's still possible to mount it on a different operating system (by accident) and therefore corrupting all data.

Ideally the PV should have the same additional topology keys as specified in the storageclass itself. For example:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: encrypted-windows
parameters:
  encrypted: "true"
  type: gp3
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/os
    values:
    - windows

should result in the kubernetes.io/os: windows selector to be included in the PV.

@torredil Can you re-open this issue? Or should we move this to a new task?