hetznercloud / csi-driver

Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes
MIT License
620 stars 102 forks source link

MapVolume.SetUp failed for volume: blockMapper.stageVolumeForBlock failed: "no mount capability" #66

Closed rach-sharp closed 4 years ago

rach-sharp commented 4 years ago

I'm trying to set up Ceph via Rook, using pvcs with a StorageClass powered by hetznercloud/csi-driver, but Volumes get stuck between being attached and being mounted to a Pod.

NAMESPACE      NAME                                            READY   STATUS     RESTARTS   AGE
rook-hetzner   rook-ceph-osd-prepare-set1-0-data-wnvcq-mjrwc   0/1     Init:0/2   0          26m
rook-hetzner   rook-ceph-osd-prepare-set1-1-data-nc7cw-96czb   0/1     Init:0/2   0          26m
rook-hetzner   rook-ceph-osd-prepare-set1-2-data-mbzrb-ggftg   0/1     Init:0/2   0          26m

When I describe one of these pods, there are the following events:

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               <unknown>           default-scheduler        Successfully assigned rook-hetzner/rook-ceph-osd-prepare-set1-2-data-mbzrb-ggftg to kube3
  Warning  FailedAttachVolume      17m (x4 over 17m)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef" : rpc error: code = Aborted desc = failed to publish volume: server is locked
  Normal   SuccessfulAttachVolume  17m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef"
  Warning  FailedMount             15m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-ceph-osd-token-8rjhx rook-data rook-ceph-log udev set1-2-data-mbzrb-bridge ceph-conf-emptydir devices rook-binaries set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             12m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-data ceph-conf-emptydir set1-2-data-mbzrb rook-ceph-osd-token-8rjhx set1-2-data-mbzrb-bridge udev rook-binaries rook-ceph-log devices]: timed out waiting for the condition
  Warning  FailedMount             10m                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[udev rook-binaries set1-2-data-mbzrb-bridge rook-data rook-ceph-osd-token-8rjhx ceph-conf-emptydir rook-ceph-log devices set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             8m17s               kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[set1-2-data-mbzrb set1-2-data-mbzrb-bridge rook-data ceph-conf-emptydir rook-ceph-log devices udev rook-ceph-osd-token-8rjhx rook-binaries]: timed out waiting for the condition
  Warning  FailedMount             6m2s                kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-ceph-log devices rook-binaries rook-ceph-osd-token-8rjhx set1-2-data-mbzrb rook-data ceph-conf-emptydir udev set1-2-data-mbzrb-bridge]: timed out waiting for the condition
  Warning  FailedMount             3m45s               kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[rook-data ceph-conf-emptydir rook-ceph-log udev rook-ceph-osd-token-8rjhx set1-2-data-mbzrb-bridge devices rook-binaries set1-2-data-mbzrb]: timed out waiting for the condition
  Warning  FailedMount             90s                 kubelet, kube3           Unable to attach or mount volumes: unmounted volumes=[set1-2-data-mbzrb], unattached volumes=[devices rook-ceph-osd-token-8rjhx set1-2-data-mbzrb set1-2-data-mbzrb-bridge rook-data ceph-conf-emptydir rook-ceph-log udev rook-binaries]: timed out waiting for the condition
  Warning  FailedMapVolume         31s (x16 over 16m)  kubelet, kube3           MapVolume.SetUp failed for volume "pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed: rpc error: code = InvalidArgument desc = no mount capability

After a little digging, no mount capability comes from the file https://github.com/hetznercloud/csi-driver/blob/master/driver/node.go

My hetzner cloud k8s manifests installed are just the ones from the README, my CephCluster manifest is:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-hetzner
  namespace: rook-hetzner
spec:
  dataDirHostPath: /var/lib/rook
  network:
    hostNetwork: false
  placement:
    all:
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Equal
        effect: NoSchedule
  cephVersion:
    image: ceph/ceph:v14.2.4-20190917
  mon:
    count: 3
    allowMultiplePerNode: false
  storage:
   storageClassDeviceSets:
    - name: set1
      count: 3
      portable: true
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 30Gi
          # IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
          storageClassName: hcloud-volumes
          volumeMode: Block
          accessModes:
            - ReadWriteOnce

Involved PVs

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS     REASON   AGE
pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef   30Gi       RWO            Delete           Bound    rook-hetzner/set1-2-data-mbzrb   hcloud-volumes            35m
pvc-ed978abc-56c7-4cfc-83f5-0201ceda53f3   30Gi       RWO            Delete           Bound    rook-hetzner/set1-0-data-wnvcq   hcloud-volumes            35m
pvc-f9433424-d211-454d-99fb-b90fa9891357   30Gi       RWO            Delete           Bound    rook-hetzner/set1-1-data-nc7cw   hcloud-volumes            35m

Involved PVCs

rook-hetzner   set1-0-data-wnvcq   Bound    pvc-ed978abc-56c7-4cfc-83f5-0201ceda53f3   30Gi       RWO            hcloud-volumes   35m
rook-hetzner   set1-1-data-nc7cw   Bound    pvc-f9433424-d211-454d-99fb-b90fa9891357   30Gi       RWO            hcloud-volumes   35m
rook-hetzner   set1-2-data-mbzrb   Bound    pvc-5f2b44a7-169c-4958-a61a-4c86e8186cef   30Gi       RWO            hcloud-volumes   35m

Any ideas why the volume would have no mount capability?

ckotzbauer commented 4 years ago

I have the same problem with the csi in my cluster. Does anybody know if there are workarounds? Please let me know, if more infos are needed of configs or error-messages.

/cc @thcyron

LKaemmerling commented 4 years ago

Could you try removing volumeMode: Block ?

ckotzbauer commented 4 years ago

@LKaemmerling I removed the volumeMode: Block and now the osd-prepare Pod does not start cause of this message:

Warning  FailedMount             2m5s               kubelet, kubeworker1     Unable to attach or mount volumes: unmounted volumes=[set1-0-data-q5tz4], unattached volumes=[rook-binaries rook-ceph-osd-token-c9r9z rook-ceph-log set1-0-data-q5tz4 udev set1-0-data-q5tz4-bridge rook-data rook-ceph-crash ceph-conf-emptydir devices]: volume set1-0-data-q5tz4 has volumeMode Filesystem, but is specified in volumeDevices
LKaemmerling commented 4 years ago

@code-chris okay, then try to let it in there. Which csi-driver version are you running? Could you try the latest tag? (https://github.com/hetznercloud/csi-driver/blob/master/deploy/kubernetes/hcloud-csi-master.yml)

ckotzbauer commented 4 years ago

I used this one: https://github.com/hetznercloud/csi-driver/blob/master/deploy/kubernetes/hcloud-csi.yml. Looks like almost the same...

LKaemmerling commented 4 years ago

They are almost the same, except the used CSI Driver image (hcloud-csi.yml: image: hetznercloud/hcloud-csi-driver:1.2.2; hcloud-csi-master.yml image: hetznercloud/hcloud-csi-driver:latest)

ckotzbauer commented 4 years ago

Ah you're right. I will try this one in the evening and give feedback!

Negashev commented 4 years ago

@LKaemmerling I removed the volumeMode: Block and now the osd-prepare Pod does not start cause of this message:

Warning  FailedMount             2m5s               kubelet, kubeworker1     Unable to attach or mount volumes: unmounted volumes=[set1-0-data-q5tz4], unattached volumes=[rook-binaries rook-ceph-osd-token-c9r9z rook-ceph-log set1-0-data-q5tz4 udev set1-0-data-q5tz4-bridge rook-data rook-ceph-crash ceph-conf-emptydir devices]: volume set1-0-data-q5tz4 has volumeMode Filesystem, but is specified in volumeDevices

Rook storageClassDeviceSets work only with volumeMode: Block

ckotzbauer commented 4 years ago

@LKaemmerling No, doesn't work. Then the original error appears again:

MapVolume.SetUpDevice failed for volume "pvc-92672abb-11c7-46e7-a6cd-ca847a603e7f" : kubernetes.io/csi: blockMapper.stageVolumeForBlock failed: rpc error: code = InvalidArgument desc = no mount capability
github-actions[bot] commented 4 years ago

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

ckotzbauer commented 4 years ago

@LKaemmerling Any news for this issue? The problem still persists....

emazzotta commented 4 years ago

I'm also experiencing the same issue when setting up ceph via hcloud-volumes (using the templates of hetnercloud csi-driver 1.2.3)

z1r0- commented 4 years ago

Same issue. Any hope we can get an update soon?

z1r0- commented 4 years ago

I worked around it by deploying Rancher's Longhorn to my k8s (didn't use a hcloud volume, but since it replicates on 3 nodes I'm fine with that, but I guess it should be possible to use hcloud volumes for it - didn't dig into it). Then I set the storage-class to longhorn in cluster-on-pvc.yaml and was very surprised to see everything working perfectly.

Still would love if it worked with hcloud-volumes directly but at least I got rook-cephfs running in my cloud and it performs so much better then nfs-provisioner in case you need RWX volumes for your pods ♥

ckotzbauer commented 4 years ago

@LKaemmerling Maybe this should be reopened, cause there are enough folks which have the same problem...

ahilsend commented 4 years ago

Just encountered the same while trying to setup a rook ceph cluster. I realized rook requires block devices (not formatted filesystems):

https://rook.io/docs/rook/v1.3/ceph-cluster-crd.html#storage-class-device-sets

So if volumeMode is not set to block, then the error is instead:

volume set1-data-0-vx7h5 has volumeMode Filesystem, but is specified in volumeDevices

Which I believe hcloud doesn't support.

fff0x commented 4 years ago

Any update on this? Why its not possible to get a raw block device?

I'm facing the same open issue, when trying to use Rook-Ceph cluster on PVC.

LKaemmerling commented 4 years ago

Our provider does not support getting raw block devices at the moment. We may look into this in the future.

ahilsend commented 4 years ago

I've added support for block mode in my branch: https://github.com/ahilsend/csi-driver/tree/volumemode-block

Didn't have time to add test yet, but have been running it successfully for 2 weeks to provision both block & fs volumes on k8s 1.18+.

I'm hoping I'll find some time to add those tests next week, and open a PR.

fff0x commented 4 years ago

@ahilsend unfortunately I wasn't able to manage it to work with rook-ceph using a cluster on pvc. I see the volumes coming up and beeing assigned to one of my three storage-nodes, that belongs to ceph monitors. But then, for some reasons they are beeing unattached and attached again to the wrong servers, often it ended with multiple volumes on one server and the "osd-prepare" pod never finishes, with a lot of errors while initializing.

Let me know, if some of the logs may help you or if I should dig deeper.

ahilsend commented 4 years ago

often it ended with multiple volumes on one server

Multiple pods on the same node can happen, have you configured podAntiAffinities on your cluster?

the "osd-prepare" pod never finishes, with a lot of errors while initializing.

Is it something with rook itself or the CSI driver? I'm no expert with rook, not sure how much help I can help be. Check the rook osd-prepare and operator logs.

If the CSI driver is not doing what it should, logs would help.

fff0x commented 4 years ago

I have AntiAffinity rules that prevents multiple mons and osd's on the same node. And first I can see that every of my storage-nodes got one volume attached as soon as they are requested. But with the new csi-driver image that includes your code changes, the volumes were detached and reattached, sometimes multiple volumes to the same node, even if the volumes runs in different DCs. Thats pretty weird, since I can't do that using the webfrontend.

Probably this has nothing to do with rook, I'm almost using the same setup in AWS and Azure, of course using their csi driver for the pv's.

I already take a look into the csi-driver logs and compare them to the one, before your changes and there were some warnings, but nothing meaningful. Let me run a clean install for fresh logs.

Just for the record, I'm using v1.3.1/deploy/kubernetes/hcloud-csi.yml for the deployment, only replacing, the hcloud-csi-driver images. Hope thats correct.

Many thanks in advance, I really appreciate your work!

PCatinean commented 4 years ago

I'm pretty much a noob when it comes to storage in k8s and I've also struggled a lot in the past with setting up rook in my rancher cluster deployed on hetzner. Is there a reason for using a ceph cluster when you can just use the csi to satisfy pod PVCs? Other than data replication and integrity that ceph offers?

ahilsend commented 4 years ago

I'm pretty much a noob when it comes to storage in k8s and I've also struggled a lot in the past with setting up rook in my rancher cluster deployed on hetzner. Is there a reason for using a ceph cluster when you can just use the csi to satisfy pod PVCs? Other than data replication and integrity that ceph offers?

The hetzner volumes are RWO - they can only be mounted once. I have a use case for RWX - multiple pods accessing the same volume.

For that I use rook to setup a CephFS, which does that. The CephFS itself is running on top of hetzner CSI block volumes.

For all other RWO cases, I use the hetzner CSI directly.

PCatinean commented 4 years ago

Ahhh gotcha, with CephFS it totally makes sense (also another option I think would be NFS of Gluster) but that was my worry, that there is some other extra benefit on RWO that I was not aware of and I was happy for nothing that I can just use the csi driver and drop rook for the time being :)

fff0x commented 4 years ago

This is the same usecase here. I need ReadWriteMany access to the persistent volumes. Of course I can attach a volume manually to a storage-node and use it unformatted as block device using the discover feature of rook-ceph, but with a cluster on PV, I can disable auto-discover and let the monitor pods automatically request volumes using the storageclass provided by the CSI driver.

PCatinean commented 4 years ago

Thanks for the clarification guys, I was actually doing the manual approach @mbuelte :D so now using the csi driver feels great. Will probably get back to Ceph when needing RWM.

fff0x commented 4 years ago

@ahilsend here are some fresh logs from the csi-driver pods and from one of the test storage-nodes directly.

controller-hcloud_csi_driver:

level=debug ts=2020-05-20T05:40:31.010067655Z component=grpc-server msg="handling request" req="volume_id:\"5514604\" node_id:\"5924240\" volume_capability:<block:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"XXXXXXXXX-8081-csi.hetzner.cloud\" > "
level=info ts=2020-05-20T05:40:31.010166189Z component=api-volume-service msg="attaching volume" volume-id=5514604 server-id=5924240
level=debug ts=2020-05-20T05:40:31.387845488Z component=grpc-server msg="handling request" req="volume_id:\"5514605\" node_id:\"5924242\" volume_capability:<block:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"XXXXXXXX-8081-csi.hetzner.cloud\" > "
level=info ts=2020-05-20T05:40:31.387959992Z component=api-volume-service msg="attaching volume" volume-id=5514605 server-id=5924242
level=info ts=2020-05-20T05:40:32.25457159Z component=api-volume-service msg="failed to attach volume" volume-id=5514605 server-id=5924242 err="cannot perform operation because server is locked (locked)"
level=error ts=2020-05-20T05:40:32.313396856Z component=grpc-server msg="handler failed" err="rpc error: code = Unavailable desc = failed to publish volume: server is locked"

node-hcloud_csi_driver:

level=debug ts=2020-05-20T05:40:46.19469452Z component=linux-mount-service msg="publishing block volume" volume-name=pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb target-path=/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/aceebf44-4254-4384-9c8f-4b6cf0a8f8a7 volume-path=/dev/disk/by-id/scsi-0HC_Volume_5514603 readonly=false additional-mount-options="unsupported value type"

storage node #2 syslogs:

May 20 07:40:31 minion-2 k3s[6779]: E0520 07:40:31.911067    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514603 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:33.911005509 +0200 CEST m=+186.284088056 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514603\") pod \"rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9\" (UID: \"aceebf44-4254-4384-9c8f-4b6cf0a8f8a7\") "
May 20 07:40:32 minion-2 k3s[6779]: I0520 07:40:32.011683    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514605") pod "rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q" (UID: "f88db6ef-ad24-44e5-9da0-5a279979aa2c")
May 20 07:40:32 minion-2 k3s[6779]: E0520 07:40:32.011924    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514605 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:33.011863809 +0200 CEST m=+185.384946326 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514605\") pod \"rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q\" (UID: \"f88db6ef-ad24-44e5-9da0-5a279979aa2c\") "
May 20 07:40:32 minion-2 kernel: scsi 3:0:0:1: Direct-Access     HC       Volume           2.5+ PQ: 0 ANSI: 5
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: Power-on or device reset occurred
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: Attached scsi generic sg2 type 0
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: [sdb] Write Protect is off
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 20 07:40:32 minion-2 kernel: sd 3:0:0:1: [sdb] Attached SCSI disk
May 20 07:40:33 minion-2 k3s[6779]: I0520 07:40:33.017176    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514605") pod "rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q" (UID: "f88db6ef-ad24-44e5-9da0-5a279979aa2c")
May 20 07:40:33 minion-2 k3s[6779]: E0520 07:40:33.017356    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514605 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:35.017318419 +0200 CEST m=+187.390400946 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514605\") pod \"rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q\" (UID: \"f88db6ef-ad24-44e5-9da0-5a279979aa2c\") "
May 20 07:40:33 minion-2 k3s[6779]: I0520 07:40:33.922013    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7")
May 20 07:40:33 minion-2 k3s[6779]: E0520 07:40:33.922180    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514603 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:37.922136657 +0200 CEST m=+190.295219174 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514603\") pod \"rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9\" (UID: \"aceebf44-4254-4384-9c8f-4b6cf0a8f8a7\") "
May 20 07:40:35 minion-2 k3s[6779]: I0520 07:40:35.028035    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514605") pod "rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q" (UID: "f88db6ef-ad24-44e5-9da0-5a279979aa2c")
May 20 07:40:35 minion-2 k3s[6779]: E0520 07:40:35.028247    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514605 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:39.028211068 +0200 CEST m=+191.401293536 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514605\") pod \"rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q\" (UID: \"f88db6ef-ad24-44e5-9da0-5a279979aa2c\") "
May 20 07:40:35 minion-2 kernel: scsi 3:0:0:2: Direct-Access     HC       Volume           2.5+ PQ: 0 ANSI: 5
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: Power-on or device reset occurred
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: Attached scsi generic sg3 type 0
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: [sdc] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: [sdc] Write Protect is off
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: [sdc] Mode Sense: 63 00 00 08
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 20 07:40:35 minion-2 kernel: sd 3:0:0:2: [sdc] Attached SCSI disk
May 20 07:40:37 minion-2 k3s[6779]: I0520 07:40:37.944334    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7")
May 20 07:40:37 minion-2 k3s[6779]: E0520 07:40:37.944491    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514603 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:45.944451656 +0200 CEST m=+198.317534123 (durationBeforeRetry 8s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514603\") pod \"rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9\" (UID: \"aceebf44-4254-4384-9c8f-4b6cf0a8f8a7\") "
May 20 07:40:39 minion-2 k3s[6779]: I0520 07:40:39.049843    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514605") pod "rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q" (UID: "f88db6ef-ad24-44e5-9da0-5a279979aa2c")
May 20 07:40:39 minion-2 k3s[6779]: E0520 07:40:39.050001    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514605 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:47.049955312 +0200 CEST m=+199.423037829 (durationBeforeRetry 8s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-75e31e5b-9826-41a2-80e0-d6722a1760f8\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514605\") pod \"rook-ceph-osd-prepare-set1-data-2-l2mk4-4jk6q\" (UID: \"f88db6ef-ad24-44e5-9da0-5a279979aa2c\") "
May 20 07:40:40 minion-2 k3s[6779]: I0520 07:40:40.958764    6779 reconciler.go:303] Volume detached for volume "rook-ceph-crash-collector-keyring" (UniqueName: "kubernetes.io/secret/1ba18009-61a2-404a-aa00-5f92d3e20bfe-rook-ceph-crash-collector-keyring") on node "minion-2" DevicePath ""
May 20 07:40:45 minion-2 k3s[6779]: I0520 07:40:45.981544    6779 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7")
May 20 07:40:45 minion-2 k3s[6779]: I0520 07:40:45.986797    6779 operation_generator.go:1245] Controller attach succeeded for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7") device path: ""
May 20 07:40:46 minion-2 k3s[6779]: I0520 07:40:46.082417    6779 operation_generator.go:881] MapVolume.WaitForAttach entering for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7") DevicePath ""
May 20 07:40:46 minion-2 k3s[6779]: I0520 07:40:46.086750    6779 operation_generator.go:890] MapVolume.WaitForAttach succeeded for volume "pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb" (UniqueName: "kubernetes.io/csi/csi.hetzner.cloud^5514603") pod "rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9" (UID: "aceebf44-4254-4384-9c8f-4b6cf0a8f8a7") DevicePath "csi-111daa4b77421b11c26fb03add0e9b0e5eab1ca1fbab3a755a3d4a3765ae750e"
May 20 07:40:46 minion-2 systemd[1]: Started Kubernetes systemd probe.
May 20 07:40:46 minion-2 systemd[1]: run-ra3c46bb0a3cc4c77bf22af970150100a.scope: Succeeded.
May 20 07:40:46 minion-2 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/dev/aceebf44-4254-4384-9c8f-4b6cf0a8f8a7.
May 20 07:40:46 minion-2 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/dev/aceebf44-4254-4384-9c8f-4b6cf0a8f8a7.
May 20 07:40:46 minion-2 k3s[6779]: E0520 07:40:46.261184    6779 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.hetzner.cloud^5514603 podName: nodeName:}" failed. No retries permitted until 2020-05-20 07:40:46.761129727 +0200 CEST m=+199.134212184 (durationBeforeRetry 500ms). Error: "MapVolume.MapBlockVolume failed for volume \"pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb\" (UniqueName: \"kubernetes.io/csi/csi.hetzner.cloud^5514603\") pod \"rook-ceph-osd-prepare-set1-data-0-n22jc-4n7l9\" (UID: \"aceebf44-4254-4384-9c8f-4b6cf0a8f8a7\") : blkUtil.AttachFileDevice failed. globalMapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/dev, podUID: aceebf44-4254-4384-9c8f-4b6cf0a8f8a7: GetLoopDevice failed for path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/dev/aceebf44-4254-4384-9c8f-4b6cf0a8f8a7: losetup -j /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-28ee81f4-dd4c-4cc4-9731-4cfb5c561ddb/dev/aceebf44-4254-4384-9c8f-4b6cf0a8f8a7 failed: exit status 1"

I also noticed the change of the node where the volumes was attached previously. In my tests, I was using three storage-nodes, on each runs a ceph monitor pod that requests the volume. Some seconds later, all of them got its own volume attached. But then, something happend that unattached the volume and reattach it on another node. Probably, this leads to the problem I see here.

When I'm using the auto-discover feature of rook-ceph, instead of storageClassDeviceSets, the new attached volumes are found, but ceph is unable to use them as raw block device, since they are formatted with ext4. When I manually wipe the fs, ceph is using them after a while and forms the cluster, but this setup does not differs from my previous one, where I was attaching the volumes manually, wipe them and so on. I don't want to use "auto-discover" and there should no need for "manually" intervention.

LKaemmerling commented 4 years ago

I'm happy to announce that we just released v1.4.0 of our CSI driver which includes this. The new container should be available in a couple of minutes.

hadifarnoud commented 4 years ago

Does this mean RWO is no longer an issue? I used Hetzner drivers for many deployments that scaled to more than one pod and if only one can access the storage, that is a huge issue for me

z1r0- commented 4 years ago

@hadifarnoud You still need a storage provider like rook-ceph that supports RWX. But you can now use hetzner volumes to deploy it. The volumes themself are still RWO afaik.