kubernetes-sigs / blob-csi-driver

Azure Blob Storage CSI driver
Apache License 2.0
123 stars 81 forks source link

Azure blob storage container is mounted as requested (RWO) for the very first time and read-only subsequently, despite no changes in the configuration. #1598

Open MarkKharitonov opened 3 weeks ago

MarkKharitonov commented 3 weeks ago

What happened: I have 33 CRON jobs running daily. Each job mounts its own storage container, refreshes a git repo there and exits until the next scheduled invocation. I recently added another one (34th) and that one behaves weirdly - it is correctly mounted as read-write the very first time. Just like the other 33. But the subsequent mounts of it are all read-only. Needless to say it prevents the job from successfully refreshing the git repo on that storage container.

What you expected to happen: The storage container is mounted read/write just like all the others that I mount in the same manner.

How to reproduce it: It reproduces 100% on my cluster. No idea how it can be reproduced by others.

Anything else we need to know?: ???

Environment:

I have extracted the logs for a typical storage container as well as the problematic one from the respective instance of the csi-blob-node DaemonSet. The logs for the first mount as well as for any other storage container look the same with the obvious exception for timestamp, latency, names/uuids of pods/containers (I obfuscated the storage account name):

7982 utils.go:104] GRPC call: /csi.v1.Node/NodeStageVolume
7982 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["nconnect=4"]}},"access_mode":{"mode":7}},"volume_context":{"containerName":"xyz","protocol":"nfs","resourceGroup":"app505-aida-eastus2-aida-dev","storageAccount":"***MY STORAGE ACCOUNT***"},"volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 blob.go:457] parsing volumeID(***MY STORAGE ACCOUNT***_xyz) return with error: error parsing volume id: "***MY STORAGE ACCOUNT***_xyz", should at least contain two #
7982 blob.go:538] volumeID(***MY STORAGE ACCOUNT***_xyz) authEnv: []
7982 nodeserver.go:353] target /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount
protocol nfs

volumeId ***MY STORAGE ACCOUNT***_xyz
mountflags [nconnect=4]
serverAddress ***MY STORAGE ACCOUNT***.blob.core.windows.net
7982 nodeserver.go:363] set AZURE_ENDPOINT_OVERRIDE to windows.net
7982 mount_linux.go:218] Mounting cmd (mount) with arguments (-t aznfs -o nconnect=4,sec=sys,vers=3,nolock ***MY STORAGE ACCOUNT***.blob.core.windows.net:/***MY STORAGE ACCOUNT***/xyz /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount)
7982 blob.go:1043] skip chmod on targetPath(/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount) since mode is already 020000000777)
7982 nodeserver.go:395] volume(***MY STORAGE ACCOUNT***_xyz) mount ***MY STORAGE ACCOUNT***.blob.core.windows.net:/***MY STORAGE ACCOUNT***/xyz on /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount succeeded
7982 azure_metrics.go:118] "Observed Request Latency" latency_seconds=0.317157242 request="blob_csi_driver_node_stage_volume" resource_group="app505-aida-nonprod-eastus2-aks-green-node" subscription_id="" source="blob.csi.azure.com" volumeid="***MY STORAGE ACCOUNT***_xyz" result_code="succeeded"
7982 utils.go:111] GRPC response: {}
7982 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
7982 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount","target_path":"/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["nconnect=4"]}},"access_mode":{"mode":7}},"volume_context":{"containerName":"xyz","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"xyz-28770181-dsm5f","csi.storage.k8s.io/pod.namespace":"aida-dev-data-sources","csi.storage.k8s.io/pod.uid":"pod-uid","csi.storage.k8s.io/serviceAccount.name":"workload-identity","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","protocol":"nfs","resourceGroup":"app505-aida-eastus2-aida-dev","storageAccount":"***MY STORAGE ACCOUNT***"},"volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 nodeserver.go:139] NodePublishVolume: volume ***MY STORAGE ACCOUNT***_xyz mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount at /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount with mountOptions: [bind]
7982 mount_linux.go:218] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount)
7982 mount_linux.go:218] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount)
7982 nodeserver.go:155] NodePublishVolume: volume ***MY STORAGE ACCOUNT***_xyz mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount at /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount successfully
7982 utils.go:111] GRPC response: {}
7982 utils.go:104] GRPC call: /csi.v1.Node/NodeUnpublishVolume
7982 utils.go:105] GRPC request: {"target_path":"/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount","volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 nodeserver.go:219] NodeUnpublishVolume: unmounting volume ***MY STORAGE ACCOUNT***_xyz on /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount
7982 mount_helper_common.go:93] unmounting "/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
7982 mount_linux.go:360] Unmounting /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount
7982 mount_helper_common.go:150] Deleting path "/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount"
7982 nodeserver.go:224] NodeUnpublishVolume: unmount volume ***MY STORAGE ACCOUNT***_xyz on /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount successfully
7982 utils.go:111] GRPC response: {}
7982 utils.go:104] GRPC call: /csi.v1.Node/NodeUnstageVolume
7982 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount","volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 nodeserver.go:510] NodeUnstageVolume: volume ***MY STORAGE ACCOUNT***_xyz unmounting on /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount
7982 mount_helper_common.go:93] unmounting "/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
7982 mount_linux.go:360] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount
7982 mount_helper_common.go:150] Deleting path "/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount"
7982 nodeserver.go:515] NodeUnstageVolume: volume ***MY STORAGE ACCOUNT***_xyz unmount on /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount successfully
7982 azure_metrics.go:118] "Observed Request Latency" latency_seconds=0.021768201 request="blob_csi_driver_node_unstage_volume" resource_group="app505-aida-nonprod-eastus2-aks-green-node" subscription_id="" source="blob.csi.azure.com" volumeid="***MY STORAGE ACCOUNT***_xyz" result_code="succeeded"
7982 utils.go:111] GRPC response: {}

And here are the "problematic" logs:

7982 utils.go:104] GRPC call: /csi.v1.Node/NodePublishVolume
7982 utils.go:105] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount","target_path":"/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["nconnect=4"]}},"access_mode":{"mode":7}},"volume_context":{"containerName":"xyz","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"xyz-28771621-tqr4f","csi.storage.k8s.io/pod.namespace":"aida-dev-data-sources","csi.storage.k8s.io/pod.uid":"pod-uid","csi.storage.k8s.io/serviceAccount.name":"workload-identity","csi.storage.k8s.io/serviceAccount.tokens":"***stripped***","protocol":"nfs","resourceGroup":"app505-aida-eastus2-aida-dev","storageAccount":"***MY STORAGE ACCOUNT***"},"volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 nodeserver.go:139] NodePublishVolume: volume ***MY STORAGE ACCOUNT***_xyz mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount at /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount with mountOptions: [bind]
7982 mount_linux.go:218] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount)
7982 mount_linux.go:218] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount)
7982 nodeserver.go:155] NodePublishVolume: volume ***MY STORAGE ACCOUNT***_xyz mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/1b0191d5615744205f9480e6ec5927221e7ae72861f2c6ed726cf3b4d325dae4/globalmount at /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount successfully
7982 utils.go:111] GRPC response: {}
7982 utils.go:104] GRPC call: /csi.v1.Node/NodeUnpublishVolume
7982 utils.go:105] GRPC request: {"target_path":"/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount","volume_id":"***MY STORAGE ACCOUNT***_xyz"}
7982 nodeserver.go:219] NodeUnpublishVolume: unmounting volume ***MY STORAGE ACCOUNT***_xyz on /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount
7982 mount_helper_common.go:93] unmounting "/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
7982 mount_linux.go:360] Unmounting /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount
7982 mount_helper_common.go:150] Deleting path "/var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount"
7982 nodeserver.go:224] NodeUnpublishVolume: unmount volume ***MY STORAGE ACCOUNT***_xyz on /var/lib/kubelet/pods/pod-uid/volumes/kubernetes.io~csi/aida-dev-data-sources-xyz/mount successfully
7982 utils.go:111] GRPC response: {}

Notice how the second attempt to mount the storage container lacks calls to /csi.v1.Node/NodeStageVolume and /csi.v1.Node/NodeUnstageVolume.

What is going on?

andyzhangx commented 3 weeks ago

the mount process looks good, if you could ssh to the agent node, there is a standalone blobfuse2 process running for a single mount (see below example), the blobfuse2 would take around 2GB memory at most for each mount, so if you have lots of blobfuse mount, it could be running out of memory. pls also collect /var/log/blobfuse2.log on the node for investigation.

# ps aux | grep blobfuse
root       63671  5.3  0.3 2252664 26252 ?       Ssl  Jul31 3549:40 blobfuse2 mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/f68efee52cecc7b3caa269a158fd3157db1b1dadd465164036f7ebf5707deb8c/globalmount -o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=10 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000 --pre-mount-validate=true --use-https=true --empty-dir-check=false --tmp-path=/mnt/xxx#xxx#pvc-2fdf65cc-5b23-4f09-bec8-3b93f5001770##default# --container-name=pvc-2fdf65cc-5b23-4f09-bec8-3b93f5001770 --ignore-open-flags=true
MarkKharitonov commented 3 weeks ago

Hi, here is what I see:

root@aks-appnodepool1-51495026-vmss00005P:/# ps aux | grep blobfuse
root        6824  0.0  0.1 1260776 29340 ?       Ssl  Sep12   0:05 /usr/bin/blobfuse-proxy --v=5 --blobfuse-proxy-endpoint=unix://var/lib/kubelet/plugins/blob.csi.azure.com/blobfuse-proxy.sock
root        7982  0.0  0.3 1284748 58500 ?       Ssl  Sep12   0:40 /blobplugin --v=5 --endpoint=unix:///csi/csi.sock --blobfuse-proxy-endpoint=unix:///csi/blobfuse-proxy.sock --enable-blobfuse-proxy=true --nodeid=aks-appnodepool1-51495026-vmss00005p --user-agent-suffix=AKS --enable-aznfs-mount=true
root     3064941  0.0  0.0   3604  1620 ?        S+   13:45   0:00 grep --color=auto blobfuse
root@aks-appnodepool1-51495026-vmss00005P:/#

But why are we talking about blobfuse if I use nfs? Just in case, here is the nfs grep:

root@aks-appnodepool1-51495026-vmss00005P:/# ps aux | grep nfs
root        7982  0.0  0.3 1284748 58484 ?       Ssl  Sep12   0:40 /blobplugin --v=5 --endpoint=unix:///csi/csi.sock --blobfuse-proxy-endpoint=unix:///csi/blobfuse-proxy.sock --enable-blobfuse-proxy=true --nodeid=aks-appnodepool1-51495026-vmss00005p --user-agent-suffix=AKS --enable-aznfs-mount=true
root        8031  0.1  0.0   4804  4052 ?        Ss   Sep12   6:20 /bin/bash /usr/sbin/aznfswatchdog
root       20593  0.0  0.0      0     0 ?        I<   Sep12   0:00 [nfsiod]
root     3067766  0.0  0.0   3472  1628 ?        S+   13:46   0:00 grep --color=auto nfs
root@aks-appnodepool1-51495026-vmss00005P:/#
andyzhangx commented 3 weeks ago

then you could run "mount | grep nfs" to find out the nfs mount on the node, and check whether it's read-only on the node directly

MarkKharitonov commented 3 weeks ago

I resolved the issue. Your comment to inspect the mounts on the node helped me realize the root cause. There is no bug on the driver. Here is what happened. The same storage container is mounted in two different pods running in two different namespaces:

  1. The "producer" pod - responsible to refresh the contents on the storage container. Triggered by the dedicated CRON job. Mounts as RWO.
  2. The "consumer" pod - processes the contents. Mounts as ROX. Runs all the time.

Both pods run on the same k8s node, but the second one runs all the time. So it mounting the storage as ROX wins.

I was put off track, by the fact that I have another storage container in exactly the same position as this one and there everything works just fine. But now I understand why it is the case - that storage container is dedicated to a particularly big repo and so the CRON job responsible to refresh it is configured to use a dedicated node pool. So it is guaranteed that the producer pod and the consumer pod run on different nodes and thus their mounts would never interfere with each other.

In my case I resolve it easily, because actually the consumer pod does not need to mount the storage in question any more (recent changes in our code), so the fix is trivial - just stop mounting it at the consumer pod.

But in general what would be the right solution? It is kind of weird that I cannot mount the same storage container multiple times (but with different mount options) on the same node - why would it be a problem at all?