volumes with same ID created for different inline pod mounts

raafatseif commented 1 year ago

What happened:

Second deployment pod stuck at creation due to FailedMount error: An operation with the given Volume ID csi-e0461bb83dcff31517e3591dd0f26a879599017206f99edc7885dabf8a3f2f9a already exists.

What you expected to happen:

Create volume with unique VolumeID using volume generation approach.

How to reproduce it:

Create a deployment using inline volume then create another similar deployment.

Anything else we need to know?:

Deployment manifest example:

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: example-deployment
spec:
  selector:
    matchLabels:
      app: example
  replicas: 1
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example
        image: example
        command:
        - sleep
        - "infinity"
        volumeMounts:
        - mountPath: "/mnt/smb"
          name: example-volume-mount
      volumes:
      - name: example-volume-mount
        csi:
          driver: file.csi.azure.com
          volumeAttributes:
            server: smb-server
            shareName: share
            secretName: mysecret
            mountOptions: "actimeo=30,cache=strict,dir_mode=0775,file_mode=0775,gid=1000,nosharesock,uid=1000,vers=3.0,mfsymlinks"

Environment:

CSI Driver version: v1.26.1
Kubernetes version (use kubectl version): v1.22.8
OS (e.g. from /etc/os-release): Ubuntu 20.04.4 LTS
Kernel (e.g. uname -a): 5.4.0-1049
Install tools: ArgoCD

andyzhangx commented 1 year ago

@raafatseif I have tried your modified example, it works well. I think the issue is that you have specified server: smb-server, the correct format is accountname.file.core.windows.net, and you don't need to specify this parameter if you are not using private end point. And since the inline volume is the same, it should use same ID, that's expected, your issue is that the first volume mount does not succeed.

volumeAttributes:
            server: smb-server
            shareName: share
            secretName: mysecret

francRang commented 1 year ago

@raafatseif I have tried your modified example, it works well. I think the issue is that you have specified server: smb-server, the correct format is accountname.file.core.windows.net, and you don't need to specify this parameter if you are not using private end point. And since the inline volume is the same, it should use same ID, that's expected, your issue is that the first volume mount does not succeed.
volumeAttributes:
            server: smb-server
            shareName: share
            secretName: mysecret

@andyzhangx Mind explaining what you mean by "private endpoint"?

Also, are you implying that if we have two separate pods in the same namespace with the same volume attributes, theoretically, this should work?

andyzhangx commented 1 year ago

@francRang yes, it should work. azure file private endpoint: https://learn.microsoft.com/en-us/azure/storage/files/storage-files-networking-endpoints?tabs=azure-portal#create-a-private-endpoint, if you don't use private endpoint, just remove server: xxx parameter.

raafatseif commented 1 year ago

Hi @andyzhangx ! We are connecting to an onprem share. The endpoint is in this format corp.<company>.com. I tried a scenario where I created the example deployment which works fine. I then scale up the deployment to 2 replicas. The second pod get stuck during Init with FailedMount error.

Here are the logs from the csi-azurefile-node daemonset pod:

I0406 16:51:46.649703       1 utils.go:76] GRPC call: /csi.v1.Node/NodePublishVolume
I0406 16:51:46.649719       1 utils.go:77] GRPC request: {"target_path":"/var/lib/kubelet/pods/3d635d2b-39ed-43d7-a319-ed00a70d1677/volumes/kubernetes.io~csi/example-volume-mount/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"csi.storage.k8s.io/ephemeral":"true","csi.storage.k8s.io/pod.name":"example-deployment-89b7bbddd-2x4q4","csi.storage.k8s.io/pod.namespace":"default","csi.storage.k8s.io/pod.uid":"3d635d2b-39ed-43d7-a319-ed00a70d1677","csi.storage.k8s.io/serviceAccount.name":"default","mountOptions":"actimeo=30,cache=strict,dir_mode=0775,file_mode=0775,gid=1000,nosharesock,uid=1000,vers=3.0,mfsymlinks","secretName":"smbcreds-real","server":"corp.<company>.com","shareName":"dfs"},"volume_id":"csi-5cd666770e8b6604f11a289263bee9fb8447c32b1f192983f9f3f2e1ce2c8b1d"}
I0406 16:51:46.649820       1 nodeserver.go:68] NodePublishVolume: ephemeral volume(csi-5cd666770e8b6604f11a289263bee9fb8447c32b1f192983f9f3f2e1ce2c8b1d) mount on /var/lib/kubelet/pods/3d635d2b-39ed-43d7-a319-ed00a70d1677/volumes/kubernetes.io~csi/example-volume-mount/mount, VolumeContext: map[csi.storage.k8s.io/ephemeral:true csi.storage.k8s.io/pod.name:example-deployment-89b7bbddd-2x4q4 csi.storage.k8s.io/pod.namespace:default csi.storage.k8s.io/pod.uid:3d635d2b-39ed-43d7-a319-ed00a70d1677 csi.storage.k8s.io/serviceAccount.name:default getaccountkeyfromsecret:true mountOptions:actimeo=30,cache=strict,dir_mode=0775,file_mode=0775,gid=1000,nosharesock,uid=1000,vers=3.0,mfsymlinks secretName:smbcreds-real secretnamespace:default server:corp.<company>.com shareName:dfs storageaccount:]
W0406 16:51:46.649843       1 azurefile.go:599] parsing volumeID(csi-5cd666770e8b6604f11a289263bee9fb8447c32b1f192983f9f3f2e1ce2c8b1d) return with error: error parsing volume id: "csi-5cd666770e8b6604f11a289263bee9fb8447c32b1f192983f9f3f2e1ce2c8b1d", should at least contain two #
E0406 16:51:46.654010       1 utils.go:81] GRPC error: rpc error: code = Aborted desc = An operation with the given Volume ID csi-5cd666770e8b6604f11a289263bee9fb8447c32b1f192983f9f3f2e1ce2c8b1d already exists

raafatseif commented 1 year ago

I've done more testing and got mixed results. For example, I created the deployment and scaled it to 4 replicas. One replica experienced the issue:

example-deployment-2-867494d456-6zdwj   0/2     Init:0/1                     0               3m10s   <none>         node-6fc6f58879-t4w
lb   <none>           <none>
example-deployment-2-867494d456-bcmpw   2/2     Running                      0               5m4s    100.64.1.154   node-6fc6f58879-hs5
84   <none>           <none>
example-deployment-2-867494d456-ks5dt   2/2     Running                      0               3m10s   100.64.4.254   node-6fc6f58879-zm4
k4   <none>           <none>
example-deployment-2-867494d456-nndc9   2/2     Running                      0               3m10s   100.64.7.188   node-6fc6f58879-jxh
gc   <none>           <none>

Error:

Warning  FailedMount  16s (x6 over 47s)  kubelet            MountVolume.SetUp failed for volume "example-volume-mount" : rpc error: code = Aborted desc = An operation with the given Volume ID csi-241818ebeb8313a16bdc837032aac2fd33f8457c8dbaf136a7ec26c1a2f2c4cd already exists

cvvz commented 1 year ago

@raafatseif Could you share the manifest of deployment and full logs of kubelet and csi-azurefile-node around the impact time?

raafatseif commented 1 year ago

@cvvz sure! csi-azurefile-node logs example-deployment-2-manifest kubelet-logs

cvvz commented 1 year ago

@raafatseif Hi, this is mainly because of cifs mount on node-6fc6f58879-t4wlb took too long, which happended at :

I0407 15:03:48.167942       1 mount_linux.go:220] Mounting cmd (mount) with arguments (-t cifs -o actimeo=30,cache=strict,dir_mode=0775,file_mode=0775,gid=1000,mfsymlinks,nosharesock,uid=1000,vers=3.0,<masked> //corp.<company>.com/dfs /var/lib/kubelet/pods/892afe4b-b8b0-43ac-b43d-26eb5269b8f5/volumes/kubernetes.io~csi/example-volume-mount/mount)

and hung for a long time, at least more than two minutes.

Kubelet kept retrying to mount the volume on the Pod until succeed, and of course every time with the same volumeID(cause they are the same volume on the same pod). However, the first mount operation still in progress, which held the lock, so, all the subsequent retry would be failed since they cannot acquire the same lock.

I'd suggest that you try to mount manually on the problem node, which is node-6fc6f58879-t4wlb in this issue, with the same mount options to see if the mount operation will be stuck for a very long time (at least more than two minutes).

From csi driver side, we will find out a way to set timeout for csi mount operation to avoid mounting hang over for too long.

raafatseif commented 1 year ago

Hi @cvvz ! I'm able to confirm that the manual mount takes longer than 2 min (continues to hang longer than that). I found that there was an existing stale mount that might have been causing these issues. Rebooting the node removed the stale mount and was able to mount again.

This makes me curious as to why new mounts hang if nosharesock option is used. Is it being ignored? We provide the option through csi.volumeAttributes.mountOptions however I do not see it being used. It would be tedious to reboot nodes due to this as this happens often.

root@node-6fc6f58879-t4wlb:~# mount | grep cifs
//corp.<company>.com/dfs on /var/lib/kubelet/pods/56ea7791-982d-4ce5-9de2-47b4aa071196/volumes/kubernetes.io~csi/example-volume-mount/mount type cifs (rw,relatime,vers=3.0,cache=strict,uid=1000,forceuid,gid=1000,forcegid,addr=<masked>,file_mode=0775,dir_mode=0775,soft,nounix,mapposix,mfsymlinks,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=30)

cvvz commented 1 year ago

csi driver doesn't ignore nosharesock option. You can create two Pods using the same file share on the same Node and use mount | grep "type cifs" and ss -t | grep microsoft-ds to make sure that there are two cifs mount points and two tcp connections on that Node.

andyzhangx commented 1 year ago

FYI. nosharesock mount option won't show up even if you set it in cifs mount

cvvz commented 1 year ago

You may need to figure out where the existing stale mount came from.

raafatseif commented 1 year ago

Awesome! Thank you both for the tips. We are testing this in our environments.

kubernetes-sigs / azurefile-csi-driver

volumes with same ID created for different inline pod mounts #1206