rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.04k stars 2.66k forks source link

MountDevice failed for volume pvc-f631... An operation with the given Volume ID already exists #4896

Open NicolaiSchmid opened 4 years ago

NicolaiSchmid commented 4 years ago

Is this a bug report or feature request?

Deviation from expected behavior: Kubernetes tries to attach the pvc to a pod and fails:

  Normal   SuccessfulAttachVolume  25m                  attachdetach-controller       AttachVolume.Attach succeeded for volume "pvc-f631ef53-35d6-438b-a496-d2ba77adb57d"
  Warning  FailedMount             23m                  kubelet, node3  MountVolume.MountDevice failed for volume "pvc-f631ef53-35d6-438b-a496-d2ba77adb57d" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             4m59s (x5 over 18m)  kubelet, node3  Unable to attach or mount volumes: unmounted volumes=[volume], unattached volumes=[volume default-token-4dbg8]: timed out waiting for the condition
  Warning  FailedMount             2m41s (x5 over 23m)  kubelet, node3  Unable to attach or mount volumes: unmounted volumes=[volume], unattached volumes=[default-token-4dbg8 volume]: timed out waiting for the condition
  Warning  FailedMount             32s (x18 over 23m)   kubelet, node3  MountVolume.MountDevice failed for volume "pvc-f631ef53-35d6-438b-a496-d2ba77adb57d" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-3e7b0d61-5335-11ea-a0a0-3e8b30a597e0 already exists

On other nodes in the cluster, the attach and mount works fine and as expected. How to reproduce it (minimal and precise):

Create an example cluster with a rbd-csi storage-class. Create a PVC and a pod, attaching the pvc. I think the issue lies somewhere in mismatching configuration, software, kernel modules, etc.

Environment: of the node trying to mount:

Madhu-1 commented 4 years ago

Please attach csi-rbdplugin container logs from the daemonset pod from the node where mount is failing

NicolaiSchmid commented 4 years ago

Of course: These are the logs, specifically of the driver-registrar container (I've only attached the last 500 lines, it seems to repeat)

I0318 08:21:59.917201    2660 utils.go:119] ID: 1620 GRPC call: /csi.v1.Identity/Probe
I0318 08:21:59.917228    2660 utils.go:120] ID: 1620 GRPC request: {}
I0318 08:21:59.918175    2660 utils.go:125] ID: 1620 GRPC response: {}
I0318 08:22:59.917270    2660 utils.go:119] ID: 1621 GRPC call: /csi.v1.Identity/Probe
I0318 08:22:59.917302    2660 utils.go:120] ID: 1621 GRPC request: {}
I0318 08:22:59.918307    2660 utils.go:125] ID: 1621 GRPC response: {}
I0318 08:23:43.931717    2660 utils.go:119] ID: 1622 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:23:43.931751    2660 utils.go:120] ID: 1622 GRPC request: {}
I0318 08:23:43.932775    2660 utils.go:125] ID: 1622 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:23:43.951422    2660 utils.go:119] ID: 1623 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:23:43.951448    2660 utils.go:120] ID: 1623 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:23:43.954113    2660 nodeserver.go:84] ID: 1623 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:23:43.954188    2660 utils.go:123] ID: 1623 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:23:59.917147    2660 utils.go:119] ID: 1624 GRPC call: /csi.v1.Identity/Probe
I0318 08:23:59.917182    2660 utils.go:120] ID: 1624 GRPC request: {}
I0318 08:23:59.918119    2660 utils.go:125] ID: 1624 GRPC response: {}
I0318 08:24:59.917121    2660 utils.go:119] ID: 1625 GRPC call: /csi.v1.Identity/Probe
I0318 08:24:59.917149    2660 utils.go:120] ID: 1625 GRPC request: {}
I0318 08:24:59.918029    2660 utils.go:125] ID: 1625 GRPC response: {}
I0318 08:25:45.995426    2660 utils.go:119] ID: 1626 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:25:45.995455    2660 utils.go:120] ID: 1626 GRPC request: {}
I0318 08:25:45.996257    2660 utils.go:125] ID: 1626 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:25:46.013779    2660 utils.go:119] ID: 1627 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:25:46.013798    2660 utils.go:120] ID: 1627 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:25:46.016018    2660 nodeserver.go:84] ID: 1627 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:25:46.016084    2660 utils.go:123] ID: 1627 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:25:59.917337    2660 utils.go:119] ID: 1628 GRPC call: /csi.v1.Identity/Probe
I0318 08:25:59.917363    2660 utils.go:120] ID: 1628 GRPC request: {}
I0318 08:25:59.918214    2660 utils.go:125] ID: 1628 GRPC response: {}
I0318 08:26:59.917215    2660 utils.go:119] ID: 1629 GRPC call: /csi.v1.Identity/Probe
I0318 08:26:59.917242    2660 utils.go:120] ID: 1629 GRPC request: {}
I0318 08:26:59.918223    2660 utils.go:125] ID: 1629 GRPC response: {}
I0318 08:27:48.063966    2660 utils.go:119] ID: 1630 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:27:48.063995    2660 utils.go:120] ID: 1630 GRPC request: {}
I0318 08:27:48.064733    2660 utils.go:125] ID: 1630 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:27:48.082519    2660 utils.go:119] ID: 1631 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:27:48.082549    2660 utils.go:120] ID: 1631 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:27:48.085138    2660 nodeserver.go:84] ID: 1631 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:27:48.085206    2660 utils.go:123] ID: 1631 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:27:59.917235    2660 utils.go:119] ID: 1632 GRPC call: /csi.v1.Identity/Probe
I0318 08:27:59.917261    2660 utils.go:120] ID: 1632 GRPC request: {}
I0318 08:27:59.918070    2660 utils.go:125] ID: 1632 GRPC response: {}
I0318 08:28:59.917318    2660 utils.go:119] ID: 1633 GRPC call: /csi.v1.Identity/Probe
I0318 08:28:59.917356    2660 utils.go:120] ID: 1633 GRPC request: {}
I0318 08:28:59.918362    2660 utils.go:125] ID: 1633 GRPC response: {}
I0318 08:29:50.139563    2660 utils.go:119] ID: 1634 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:29:50.139623    2660 utils.go:120] ID: 1634 GRPC request: {}
I0318 08:29:50.140719    2660 utils.go:125] ID: 1634 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:29:50.157300    2660 utils.go:119] ID: 1635 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:29:50.157331    2660 utils.go:120] ID: 1635 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:29:50.160228    2660 nodeserver.go:84] ID: 1635 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:29:50.160312    2660 utils.go:123] ID: 1635 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:29:59.917440    2660 utils.go:119] ID: 1636 GRPC call: /csi.v1.Identity/Probe
I0318 08:29:59.917473    2660 utils.go:120] ID: 1636 GRPC request: {}
I0318 08:29:59.918586    2660 utils.go:125] ID: 1636 GRPC response: {}
I0318 08:30:59.917230    2660 utils.go:119] ID: 1637 GRPC call: /csi.v1.Identity/Probe
I0318 08:30:59.917255    2660 utils.go:120] ID: 1637 GRPC request: {}
I0318 08:30:59.918084    2660 utils.go:125] ID: 1637 GRPC response: {}
I0318 08:31:52.204717    2660 utils.go:119] ID: 1638 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:31:52.204749    2660 utils.go:120] ID: 1638 GRPC request: {}
I0318 08:31:52.205663    2660 utils.go:125] ID: 1638 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:31:52.223571    2660 utils.go:119] ID: 1639 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:31:52.223598    2660 utils.go:120] ID: 1639 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:31:52.226178    2660 nodeserver.go:84] ID: 1639 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:31:52.226244    2660 utils.go:123] ID: 1639 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:31:59.917146    2660 utils.go:119] ID: 1640 GRPC call: /csi.v1.Identity/Probe
I0318 08:31:59.917193    2660 utils.go:120] ID: 1640 GRPC request: {}
I0318 08:31:59.918210    2660 utils.go:125] ID: 1640 GRPC response: {}
I0318 08:32:59.917175    2660 utils.go:119] ID: 1641 GRPC call: /csi.v1.Identity/Probe
I0318 08:32:59.917201    2660 utils.go:120] ID: 1641 GRPC request: {}
I0318 08:32:59.918068    2660 utils.go:125] ID: 1641 GRPC response: {}
I0318 08:33:54.275273    2660 utils.go:119] ID: 1642 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:33:54.275301    2660 utils.go:120] ID: 1642 GRPC request: {}
I0318 08:33:54.276133    2660 utils.go:125] ID: 1642 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:33:54.293845    2660 utils.go:119] ID: 1643 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:33:54.293882    2660 utils.go:120] ID: 1643 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:33:54.296780    2660 nodeserver.go:84] ID: 1643 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:33:54.296882    2660 utils.go:123] ID: 1643 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:33:59.917250    2660 utils.go:119] ID: 1644 GRPC call: /csi.v1.Identity/Probe
I0318 08:33:59.917281    2660 utils.go:120] ID: 1644 GRPC request: {}
I0318 08:33:59.918177    2660 utils.go:125] ID: 1644 GRPC response: {}
I0318 08:34:59.917112    2660 utils.go:119] ID: 1645 GRPC call: /csi.v1.Identity/Probe
I0318 08:34:59.917146    2660 utils.go:120] ID: 1645 GRPC request: {}
I0318 08:34:59.918115    2660 utils.go:125] ID: 1645 GRPC response: {}
I0318 08:35:56.346749    2660 utils.go:119] ID: 1646 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:35:56.346792    2660 utils.go:120] ID: 1646 GRPC request: {}
I0318 08:35:56.347901    2660 utils.go:125] ID: 1646 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:35:56.367895    2660 utils.go:119] ID: 1647 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:35:56.367927    2660 utils.go:120] ID: 1647 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:35:56.370769    2660 nodeserver.go:84] ID: 1647 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:35:56.370849    2660 utils.go:123] ID: 1647 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:35:59.917274    2660 utils.go:119] ID: 1648 GRPC call: /csi.v1.Identity/Probe
I0318 08:35:59.917301    2660 utils.go:120] ID: 1648 GRPC request: {}
I0318 08:35:59.918160    2660 utils.go:125] ID: 1648 GRPC response: {}
I0318 08:36:59.916920    2660 utils.go:119] ID: 1649 GRPC call: /csi.v1.Identity/Probe
I0318 08:36:59.916939    2660 utils.go:120] ID: 1649 GRPC request: {}
I0318 08:36:59.917424    2660 utils.go:125] ID: 1649 GRPC response: {}
I0318 08:37:58.418509    2660 utils.go:119] ID: 1650 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:37:58.418545    2660 utils.go:120] ID: 1650 GRPC request: {}
I0318 08:37:58.419626    2660 utils.go:125] ID: 1650 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:37:58.440026    2660 utils.go:119] ID: 1651 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:37:58.440092    2660 utils.go:120] ID: 1651 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:37:58.442951    2660 nodeserver.go:84] ID: 1651 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:37:58.443021    2660 utils.go:123] ID: 1651 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:37:59.917222    2660 utils.go:119] ID: 1652 GRPC call: /csi.v1.Identity/Probe
I0318 08:37:59.917248    2660 utils.go:120] ID: 1652 GRPC request: {}
I0318 08:37:59.918124    2660 utils.go:125] ID: 1652 GRPC response: {}
I0318 08:38:59.917201    2660 utils.go:119] ID: 1653 GRPC call: /csi.v1.Identity/Probe
I0318 08:38:59.917244    2660 utils.go:120] ID: 1653 GRPC request: {}
I0318 08:38:59.918136    2660 utils.go:125] ID: 1653 GRPC response: {}
I0318 08:39:59.917222    2660 utils.go:119] ID: 1654 GRPC call: /csi.v1.Identity/Probe
I0318 08:39:59.917257    2660 utils.go:120] ID: 1654 GRPC request: {}
I0318 08:39:59.918264    2660 utils.go:125] ID: 1654 GRPC response: {}
I0318 08:40:00.497157    2660 utils.go:119] ID: 1655 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:40:00.497188    2660 utils.go:120] ID: 1655 GRPC request: {}
I0318 08:40:00.498206    2660 utils.go:125] ID: 1655 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:40:00.516850    2660 utils.go:119] ID: 1656 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:40:00.516890    2660 utils.go:120] ID: 1656 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:40:00.519880    2660 nodeserver.go:84] ID: 1656 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:40:00.519975    2660 utils.go:123] ID: 1656 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:40:59.917237    2660 utils.go:119] ID: 1657 GRPC call: /csi.v1.Identity/Probe
I0318 08:40:59.917268    2660 utils.go:120] ID: 1657 GRPC request: {}
I0318 08:40:59.918100    2660 utils.go:125] ID: 1657 GRPC response: {}
I0318 08:41:59.917289    2660 utils.go:119] ID: 1658 GRPC call: /csi.v1.Identity/Probe
I0318 08:41:59.917311    2660 utils.go:120] ID: 1658 GRPC request: {}
I0318 08:41:59.917988    2660 utils.go:125] ID: 1658 GRPC response: {}
I0318 08:42:02.574312    2660 utils.go:119] ID: 1659 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:42:02.574340    2660 utils.go:120] ID: 1659 GRPC request: {}
I0318 08:42:02.575264    2660 utils.go:125] ID: 1659 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:42:02.594989    2660 utils.go:119] ID: 1660 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:42:02.595013    2660 utils.go:120] ID: 1660 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:42:02.597367    2660 nodeserver.go:84] ID: 1660 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:42:02.597431    2660 utils.go:123] ID: 1660 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:42:59.917359    2660 utils.go:119] ID: 1661 GRPC call: /csi.v1.Identity/Probe
I0318 08:42:59.917405    2660 utils.go:120] ID: 1661 GRPC request: {}
I0318 08:42:59.918462    2660 utils.go:125] ID: 1661 GRPC response: {}
I0318 08:43:59.917301    2660 utils.go:119] ID: 1662 GRPC call: /csi.v1.Identity/Probe
I0318 08:43:59.917336    2660 utils.go:120] ID: 1662 GRPC request: {}
I0318 08:43:59.918344    2660 utils.go:125] ID: 1662 GRPC response: {}
I0318 08:44:04.648355    2660 utils.go:119] ID: 1663 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:44:04.648396    2660 utils.go:120] ID: 1663 GRPC request: {}
I0318 08:44:04.649562    2660 utils.go:125] ID: 1663 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:44:04.668357    2660 utils.go:119] ID: 1664 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:44:04.668378    2660 utils.go:120] ID: 1664 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:44:04.670995    2660 nodeserver.go:84] ID: 1664 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:44:04.671061    2660 utils.go:123] ID: 1664 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:44:59.917349    2660 utils.go:119] ID: 1665 GRPC call: /csi.v1.Identity/Probe
I0318 08:44:59.917387    2660 utils.go:120] ID: 1665 GRPC request: {}
I0318 08:44:59.918434    2660 utils.go:125] ID: 1665 GRPC response: {}
I0318 08:45:59.917142    2660 utils.go:119] ID: 1666 GRPC call: /csi.v1.Identity/Probe
I0318 08:45:59.917167    2660 utils.go:120] ID: 1666 GRPC request: {}
I0318 08:45:59.917841    2660 utils.go:125] ID: 1666 GRPC response: {}
I0318 08:46:06.722305    2660 utils.go:119] ID: 1667 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:46:06.722330    2660 utils.go:120] ID: 1667 GRPC request: {}
I0318 08:46:06.723245    2660 utils.go:125] ID: 1667 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:46:06.740272    2660 utils.go:119] ID: 1668 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:46:06.740293    2660 utils.go:120] ID: 1668 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:46:06.742728    2660 nodeserver.go:84] ID: 1668 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:46:06.743408    2660 utils.go:123] ID: 1668 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:46:59.917250    2660 utils.go:119] ID: 1669 GRPC call: /csi.v1.Identity/Probe
I0318 08:46:59.917288    2660 utils.go:120] ID: 1669 GRPC request: {}
I0318 08:46:59.918356    2660 utils.go:125] ID: 1669 GRPC response: {}
I0318 08:47:59.917265    2660 utils.go:119] ID: 1670 GRPC call: /csi.v1.Identity/Probe
I0318 08:47:59.917303    2660 utils.go:120] ID: 1670 GRPC request: {}
I0318 08:47:59.918119    2660 utils.go:125] ID: 1670 GRPC response: {}
I0318 08:48:08.805107    2660 utils.go:119] ID: 1671 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:48:08.805151    2660 utils.go:120] ID: 1671 GRPC request: {}
I0318 08:48:08.806209    2660 utils.go:125] ID: 1671 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:48:08.825642    2660 utils.go:119] ID: 1672 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:48:08.825672    2660 utils.go:120] ID: 1672 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:48:08.828295    2660 nodeserver.go:84] ID: 1672 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:48:08.828340    2660 utils.go:123] ID: 1672 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:48:59.917269    2660 utils.go:119] ID: 1673 GRPC call: /csi.v1.Identity/Probe
I0318 08:48:59.917302    2660 utils.go:120] ID: 1673 GRPC request: {}
I0318 08:48:59.918267    2660 utils.go:125] ID: 1673 GRPC response: {}
I0318 08:49:59.917505    2660 utils.go:119] ID: 1674 GRPC call: /csi.v1.Identity/Probe
I0318 08:49:59.917529    2660 utils.go:120] ID: 1674 GRPC request: {}
I0318 08:49:59.918172    2660 utils.go:125] ID: 1674 GRPC response: {}
I0318 08:50:10.872354    2660 utils.go:119] ID: 1675 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:50:10.872392    2660 utils.go:120] ID: 1675 GRPC request: {}
I0318 08:50:10.873328    2660 utils.go:125] ID: 1675 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:50:10.892055    2660 utils.go:119] ID: 1676 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:50:10.892088    2660 utils.go:120] ID: 1676 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:50:10.894766    2660 nodeserver.go:84] ID: 1676 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:50:10.894835    2660 utils.go:123] ID: 1676 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:50:59.917338    2660 utils.go:119] ID: 1677 GRPC call: /csi.v1.Identity/Probe
I0318 08:50:59.917374    2660 utils.go:120] ID: 1677 GRPC request: {}
I0318 08:50:59.918421    2660 utils.go:125] ID: 1677 GRPC response: {}
I0318 08:51:59.917117    2660 utils.go:119] ID: 1678 GRPC call: /csi.v1.Identity/Probe
I0318 08:51:59.917152    2660 utils.go:120] ID: 1678 GRPC request: {}
I0318 08:51:59.918241    2660 utils.go:125] ID: 1678 GRPC response: {}
I0318 08:52:12.939345    2660 utils.go:119] ID: 1679 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:52:12.939381    2660 utils.go:120] ID: 1679 GRPC request: {}
I0318 08:52:12.940479    2660 utils.go:125] ID: 1679 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:52:12.960936    2660 utils.go:119] ID: 1680 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:52:12.960963    2660 utils.go:120] ID: 1680 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:52:12.963908    2660 nodeserver.go:84] ID: 1680 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:52:12.963978    2660 utils.go:123] ID: 1680 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:52:59.917223    2660 utils.go:119] ID: 1681 GRPC call: /csi.v1.Identity/Probe
I0318 08:52:59.917254    2660 utils.go:120] ID: 1681 GRPC request: {}
I0318 08:52:59.918162    2660 utils.go:125] ID: 1681 GRPC response: {}
I0318 08:53:59.917254    2660 utils.go:119] ID: 1682 GRPC call: /csi.v1.Identity/Probe
I0318 08:53:59.917291    2660 utils.go:120] ID: 1682 GRPC request: {}
I0318 08:53:59.918331    2660 utils.go:125] ID: 1682 GRPC response: {}
I0318 08:54:15.015842    2660 utils.go:119] ID: 1683 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:54:15.015881    2660 utils.go:120] ID: 1683 GRPC request: {}
I0318 08:54:15.016906    2660 utils.go:125] ID: 1683 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:54:15.036542    2660 utils.go:119] ID: 1684 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:54:15.036569    2660 utils.go:120] ID: 1684 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:54:15.039223    2660 nodeserver.go:84] ID: 1684 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:54:15.039301    2660 utils.go:123] ID: 1684 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:54:59.917430    2660 utils.go:119] ID: 1685 GRPC call: /csi.v1.Identity/Probe
I0318 08:54:59.917462    2660 utils.go:120] ID: 1685 GRPC request: {}
I0318 08:54:59.918556    2660 utils.go:125] ID: 1685 GRPC response: {}
I0318 08:55:59.917251    2660 utils.go:119] ID: 1686 GRPC call: /csi.v1.Identity/Probe
I0318 08:55:59.917285    2660 utils.go:120] ID: 1686 GRPC request: {}
I0318 08:55:59.918157    2660 utils.go:125] ID: 1686 GRPC response: {}
I0318 08:56:17.078654    2660 utils.go:119] ID: 1687 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:56:17.078696    2660 utils.go:120] ID: 1687 GRPC request: {}
I0318 08:56:17.079840    2660 utils.go:125] ID: 1687 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:56:17.098671    2660 utils.go:119] ID: 1688 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:56:17.098707    2660 utils.go:120] ID: 1688 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:56:17.101325    2660 nodeserver.go:84] ID: 1688 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:56:17.101391    2660 utils.go:123] ID: 1688 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:56:59.917477    2660 utils.go:119] ID: 1689 GRPC call: /csi.v1.Identity/Probe
I0318 08:56:59.917526    2660 utils.go:120] ID: 1689 GRPC request: {}
I0318 08:56:59.918597    2660 utils.go:125] ID: 1689 GRPC response: {}
I0318 08:57:59.917311    2660 utils.go:119] ID: 1690 GRPC call: /csi.v1.Identity/Probe
I0318 08:57:59.917345    2660 utils.go:120] ID: 1690 GRPC request: {}
I0318 08:57:59.918323    2660 utils.go:125] ID: 1690 GRPC response: {}
I0318 08:58:19.151445    2660 utils.go:119] ID: 1691 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 08:58:19.151488    2660 utils.go:120] ID: 1691 GRPC request: {}
I0318 08:58:19.152553    2660 utils.go:125] ID: 1691 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 08:58:19.171196    2660 utils.go:119] ID: 1692 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 08:58:19.171245    2660 utils.go:120] ID: 1692 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 08:58:19.174215    2660 nodeserver.go:84] ID: 1692 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 08:58:19.174295    2660 utils.go:123] ID: 1692 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 08:58:59.917244    2660 utils.go:119] ID: 1693 GRPC call: /csi.v1.Identity/Probe
I0318 08:58:59.917277    2660 utils.go:120] ID: 1693 GRPC request: {}
I0318 08:58:59.918282    2660 utils.go:125] ID: 1693 GRPC response: {}
I0318 08:59:59.917171    2660 utils.go:119] ID: 1694 GRPC call: /csi.v1.Identity/Probe
I0318 08:59:59.917198    2660 utils.go:120] ID: 1694 GRPC request: {}
I0318 08:59:59.918070    2660 utils.go:125] ID: 1694 GRPC response: {}
I0318 09:00:21.230871    2660 utils.go:119] ID: 1695 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:00:21.230898    2660 utils.go:120] ID: 1695 GRPC request: {}
I0318 09:00:21.231626    2660 utils.go:125] ID: 1695 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:00:21.248151    2660 utils.go:119] ID: 1696 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:00:21.248172    2660 utils.go:120] ID: 1696 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:00:21.250078    2660 nodeserver.go:84] ID: 1696 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:00:21.250120    2660 utils.go:123] ID: 1696 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:00:59.917272    2660 utils.go:119] ID: 1697 GRPC call: /csi.v1.Identity/Probe
I0318 09:00:59.917311    2660 utils.go:120] ID: 1697 GRPC request: {}
I0318 09:00:59.918123    2660 utils.go:125] ID: 1697 GRPC response: {}
I0318 09:01:59.917073    2660 utils.go:119] ID: 1698 GRPC call: /csi.v1.Identity/Probe
I0318 09:01:59.917100    2660 utils.go:120] ID: 1698 GRPC request: {}
I0318 09:01:59.917820    2660 utils.go:125] ID: 1698 GRPC response: {}
I0318 09:02:23.302553    2660 utils.go:119] ID: 1699 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:02:23.302591    2660 utils.go:120] ID: 1699 GRPC request: {}
I0318 09:02:23.303559    2660 utils.go:125] ID: 1699 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:02:23.324029    2660 utils.go:119] ID: 1700 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:02:23.324058    2660 utils.go:120] ID: 1700 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:02:23.326403    2660 nodeserver.go:84] ID: 1700 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:02:23.326468    2660 utils.go:123] ID: 1700 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:02:59.917407    2660 utils.go:119] ID: 1701 GRPC call: /csi.v1.Identity/Probe
I0318 09:02:59.917444    2660 utils.go:120] ID: 1701 GRPC request: {}
I0318 09:02:59.918254    2660 utils.go:125] ID: 1701 GRPC response: {}
I0318 09:03:59.917182    2660 utils.go:119] ID: 1702 GRPC call: /csi.v1.Identity/Probe
I0318 09:03:59.917216    2660 utils.go:120] ID: 1702 GRPC request: {}
I0318 09:03:59.918346    2660 utils.go:125] ID: 1702 GRPC response: {}
I0318 09:04:25.379274    2660 utils.go:119] ID: 1703 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:04:25.379384    2660 utils.go:120] ID: 1703 GRPC request: {}
I0318 09:04:25.380516    2660 utils.go:125] ID: 1703 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:04:25.398756    2660 utils.go:119] ID: 1704 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:04:25.398787    2660 utils.go:120] ID: 1704 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:04:25.401753    2660 nodeserver.go:84] ID: 1704 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:04:25.401839    2660 utils.go:123] ID: 1704 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:04:59.917250    2660 utils.go:119] ID: 1705 GRPC call: /csi.v1.Identity/Probe
I0318 09:04:59.917278    2660 utils.go:120] ID: 1705 GRPC request: {}
I0318 09:04:59.918005    2660 utils.go:125] ID: 1705 GRPC response: {}
I0318 09:05:59.917285    2660 utils.go:119] ID: 1706 GRPC call: /csi.v1.Identity/Probe
I0318 09:05:59.917315    2660 utils.go:120] ID: 1706 GRPC request: {}
I0318 09:05:59.918143    2660 utils.go:125] ID: 1706 GRPC response: {}
I0318 09:06:27.448724    2660 utils.go:119] ID: 1707 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:06:27.448762    2660 utils.go:120] ID: 1707 GRPC request: {}
I0318 09:06:27.449831    2660 utils.go:125] ID: 1707 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:06:27.468207    2660 utils.go:119] ID: 1708 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:06:27.468236    2660 utils.go:120] ID: 1708 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:06:27.470801    2660 nodeserver.go:84] ID: 1708 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:06:27.470865    2660 utils.go:123] ID: 1708 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:06:59.917379    2660 utils.go:119] ID: 1709 GRPC call: /csi.v1.Identity/Probe
I0318 09:06:59.917410    2660 utils.go:120] ID: 1709 GRPC request: {}
I0318 09:06:59.918403    2660 utils.go:125] ID: 1709 GRPC response: {}
I0318 09:07:59.917205    2660 utils.go:119] ID: 1710 GRPC call: /csi.v1.Identity/Probe
I0318 09:07:59.917244    2660 utils.go:120] ID: 1710 GRPC request: {}
I0318 09:07:59.918318    2660 utils.go:125] ID: 1710 GRPC response: {}
I0318 09:08:29.518164    2660 utils.go:119] ID: 1711 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:08:29.518192    2660 utils.go:120] ID: 1711 GRPC request: {}
I0318 09:08:29.518934    2660 utils.go:125] ID: 1711 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:08:29.535003    2660 utils.go:119] ID: 1712 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:08:29.535027    2660 utils.go:120] ID: 1712 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:08:29.537493    2660 nodeserver.go:84] ID: 1712 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:08:29.537564    2660 utils.go:123] ID: 1712 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:08:59.917301    2660 utils.go:119] ID: 1713 GRPC call: /csi.v1.Identity/Probe
I0318 09:08:59.917336    2660 utils.go:120] ID: 1713 GRPC request: {}
I0318 09:08:59.918387    2660 utils.go:125] ID: 1713 GRPC response: {}
I0318 09:09:59.917247    2660 utils.go:119] ID: 1714 GRPC call: /csi.v1.Identity/Probe
I0318 09:09:59.917282    2660 utils.go:120] ID: 1714 GRPC request: {}
I0318 09:09:59.918349    2660 utils.go:125] ID: 1714 GRPC response: {}
I0318 09:10:31.595076    2660 utils.go:119] ID: 1715 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:10:31.595116    2660 utils.go:120] ID: 1715 GRPC request: {}
I0318 09:10:31.596141    2660 utils.go:125] ID: 1715 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:10:31.615075    2660 utils.go:119] ID: 1716 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:10:31.615105    2660 utils.go:120] ID: 1716 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:10:31.617904    2660 nodeserver.go:84] ID: 1716 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:10:31.617985    2660 utils.go:123] ID: 1716 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:10:59.917372    2660 utils.go:119] ID: 1717 GRPC call: /csi.v1.Identity/Probe
I0318 09:10:59.917403    2660 utils.go:120] ID: 1717 GRPC request: {}
I0318 09:10:59.918389    2660 utils.go:125] ID: 1717 GRPC response: {}
I0318 09:11:59.917194    2660 utils.go:119] ID: 1718 GRPC call: /csi.v1.Identity/Probe
I0318 09:11:59.917231    2660 utils.go:120] ID: 1718 GRPC request: {}
I0318 09:11:59.918312    2660 utils.go:125] ID: 1718 GRPC response: {}
I0318 09:12:33.665828    2660 utils.go:119] ID: 1719 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:12:33.665856    2660 utils.go:120] ID: 1719 GRPC request: {}
I0318 09:12:33.666694    2660 utils.go:125] ID: 1719 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:12:33.684700    2660 utils.go:119] ID: 1720 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:12:33.684726    2660 utils.go:120] ID: 1720 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:12:33.687234    2660 nodeserver.go:84] ID: 1720 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:12:33.687295    2660 utils.go:123] ID: 1720 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:12:59.917159    2660 utils.go:119] ID: 1721 GRPC call: /csi.v1.Identity/Probe
I0318 09:12:59.917192    2660 utils.go:120] ID: 1721 GRPC request: {}
I0318 09:12:59.918170    2660 utils.go:125] ID: 1721 GRPC response: {}
I0318 09:13:59.917185    2660 utils.go:119] ID: 1722 GRPC call: /csi.v1.Identity/Probe
I0318 09:13:59.917215    2660 utils.go:120] ID: 1722 GRPC request: {}
I0318 09:13:59.918046    2660 utils.go:125] ID: 1722 GRPC response: {}
I0318 09:14:35.744817    2660 utils.go:119] ID: 1723 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0318 09:14:35.744851    2660 utils.go:120] ID: 1723 GRPC request: {}
I0318 09:14:35.745718    2660 utils.go:125] ID: 1723 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
I0318 09:14:35.764181    2660 utils.go:119] ID: 1724 GRPC call: /csi.v1.Node/NodeStageVolume
I0318 09:14:35.764214    2660 utils.go:120] ID: 1724 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d930d1eb-051d-4351-92ba-bb5c359acdd0/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1583512535976-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33"}
I0318 09:14:35.766273    2660 nodeserver.go:84] ID: 1724 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
E0318 09:14:35.766336    2660 utils.go:123] ID: 1724 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-15df6447-688b-11ea-9afc-1ed3ac041a33 already exists
I0318 09:14:59.917301    2660 utils.go:119] ID: 1725 GRPC call: /csi.v1.Identity/Probe
I0318 09:14:59.917342    2660 utils.go:120] ID: 1725 GRPC request: {}
I0318 09:14:59.918345    2660 utils.go:125] ID: 1725 GRPC response: {}
I0318 09:15:59.917386    2660 utils.go:119] ID: 1726 GRPC call: /csi.v1.Identity/Probe
I0318 09:15:59.917427    2660 utils.go:120] ID: 1726 GRPC request: {}
I0318 09:15:59.918689    2660 utils.go:125] ID: 1726 GRPC response: {}
I0318 09:16:59.917450    2660 utils.go:119] ID: 1727 GRPC call: /csi.v1.Identity/Probe
I0318 09:16:59.917486    2660 utils.go:120] ID: 1727 GRPC request: {}
I0318 09:16:59.918537    2660 utils.go:125] ID: 1727 GRPC response: {}
I0318 09:17:59.917137    2660 utils.go:119] ID: 1728 GRPC call: /csi.v1.Identity/Probe
I0318 09:17:59.917167    2660 utils.go:120] ID: 1728 GRPC request: {}
I0318 09:17:59.918043    2660 utils.go:125] ID: 1728 GRPC response: {}
I0318 09:18:59.917284    2660 utils.go:119] ID: 1729 GRPC call: /csi.v1.Identity/Probe
I0318 09:18:59.917325    2660 utils.go:120] ID: 1729 GRPC request: {}
I0318 09:18:59.918097    2660 utils.go:125] ID: 1729 GRPC response: {}
I0318 09:19:59.917239    2660 utils.go:119] ID: 1730 GRPC call: /csi.v1.Identity/Probe
I0318 09:19:59.917275    2660 utils.go:120] ID: 1730 GRPC request: {}
I0318 09:19:59.918351    2660 utils.go:125] ID: 1730 GRPC response: {}
I0318 09:20:59.917647    2660 utils.go:119] ID: 1731 GRPC call: /csi.v1.Identity/Probe
I0318 09:20:59.917674    2660 utils.go:120] ID: 1731 GRPC request: {}
I0318 09:20:59.918534    2660 utils.go:125] ID: 1731 GRPC response: {}
I0318 09:21:59.917167    2660 utils.go:119] ID: 1732 GRPC call: /csi.v1.Identity/Probe
I0318 09:21:59.917202    2660 utils.go:120] ID: 1732 GRPC request: {}
I0318 09:21:59.918243    2660 utils.go:125] ID: 1732 GRPC response: {}
I0318 09:22:59.917218    2660 utils.go:119] ID: 1733 GRPC call: /csi.v1.Identity/Probe
I0318 09:22:59.917256    2660 utils.go:120] ID: 1733 GRPC request: {}
I0318 09:22:59.918496    2660 utils.go:125] ID: 1733 GRPC response: {}
I0318 09:23:59.917263    2660 utils.go:119] ID: 1734 GRPC call: /csi.v1.Identity/Probe
I0318 09:23:59.917299    2660 utils.go:120] ID: 1734 GRPC request: {}
I0318 09:23:59.918333    2660 utils.go:125] ID: 1734 GRPC response: {}
I0318 09:24:59.917248    2660 utils.go:119] ID: 1735 GRPC call: /csi.v1.Identity/Probe
I0318 09:24:59.917281    2660 utils.go:120] ID: 1735 GRPC request: {}
I0318 09:24:59.918222    2660 utils.go:125] ID: 1735 GRPC response: {}
I0318 09:25:59.917306    2660 utils.go:119] ID: 1736 GRPC call: /csi.v1.Identity/Probe
I0318 09:25:59.917341    2660 utils.go:120] ID: 1736 GRPC request: {}
I0318 09:25:59.918444    2660 utils.go:125] ID: 1736 GRPC response: {}
I0318 09:26:59.917014    2660 utils.go:119] ID: 1737 GRPC call: /csi.v1.Identity/Probe
I0318 09:26:59.917035    2660 utils.go:120] ID: 1737 GRPC request: {}
I0318 09:26:59.917689    2660 utils.go:125] ID: 1737 GRPC response: {}
I0318 09:27:59.917381    2660 utils.go:119] ID: 1738 GRPC call: /csi.v1.Identity/Probe
I0318 09:27:59.917411    2660 utils.go:120] ID: 1738 GRPC request: {}
I0318 09:27:59.918300    2660 utils.go:125] ID: 1738 GRPC response: {}
I0318 09:28:59.917219    2660 utils.go:119] ID: 1739 GRPC call: /csi.v1.Identity/Probe
I0318 09:28:59.917251    2660 utils.go:120] ID: 1739 GRPC request: {}
I0318 09:28:59.918267    2660 utils.go:125] ID: 1739 GRPC response: {}
I0318 09:29:59.917349    2660 utils.go:119] ID: 1740 GRPC call: /csi.v1.Identity/Probe
I0318 09:29:59.917389    2660 utils.go:120] ID: 1740 GRPC request: {}
I0318 09:29:59.918498    2660 utils.go:125] ID: 1740 GRPC response: {}
I0318 09:30:59.917288    2660 utils.go:119] ID: 1741 GRPC call: /csi.v1.Identity/Probe
I0318 09:30:59.917309    2660 utils.go:120] ID: 1741 GRPC request: {}
I0318 09:30:59.918133    2660 utils.go:125] ID: 1741 GRPC response: {}
I0318 09:31:59.917256    2660 utils.go:119] ID: 1742 GRPC call: /csi.v1.Identity/Probe
I0318 09:31:59.917291    2660 utils.go:120] ID: 1742 GRPC request: {}
I0318 09:31:59.918203    2660 utils.go:125] ID: 1742 GRPC response: {}
I0318 09:32:59.917265    2660 utils.go:119] ID: 1743 GRPC call: /csi.v1.Identity/Probe
I0318 09:32:59.917286    2660 utils.go:120] ID: 1743 GRPC request: {}
I0318 09:32:59.918206    2660 utils.go:125] ID: 1743 GRPC response: {}
I0318 09:33:59.917330    2660 utils.go:119] ID: 1744 GRPC call: /csi.v1.Identity/Probe
I0318 09:33:59.917363    2660 utils.go:120] ID: 1744 GRPC request: {}
I0318 09:33:59.918434    2660 utils.go:125] ID: 1744 GRPC response: {}
I0318 09:34:59.917190    2660 utils.go:119] ID: 1745 GRPC call: /csi.v1.Identity/Probe
I0318 09:34:59.917227    2660 utils.go:120] ID: 1745 GRPC request: {}
I0318 09:34:59.918295    2660 utils.go:125] ID: 1745 GRPC response: {}
I0318 09:35:59.917311    2660 utils.go:119] ID: 1746 GRPC call: /csi.v1.Identity/Probe
I0318 09:35:59.917342    2660 utils.go:120] ID: 1746 GRPC request: {}
I0318 09:35:59.918357    2660 utils.go:125] ID: 1746 GRPC response: {}
I0318 09:36:59.917179    2660 utils.go:119] ID: 1747 GRPC call: /csi.v1.Identity/Probe
I0318 09:36:59.917206    2660 utils.go:120] ID: 1747 GRPC request: {}
I0318 09:36:59.918121    2660 utils.go:125] ID: 1747 GRPC response: {}
I0318 09:37:59.917305    2660 utils.go:119] ID: 1748 GRPC call: /csi.v1.Identity/Probe
I0318 09:37:59.917331    2660 utils.go:120] ID: 1748 GRPC request: {}
I0318 09:37:59.918170    2660 utils.go:125] ID: 1748 GRPC response: {}
I0318 09:38:59.917192    2660 utils.go:119] ID: 1749 GRPC call: /csi.v1.Identity/Probe
I0318 09:38:59.917221    2660 utils.go:120] ID: 1749 GRPC request: {}
I0318 09:38:59.918071    2660 utils.go:125] ID: 1749 GRPC response: {}
I0318 09:39:59.917246    2660 utils.go:119] ID: 1750 GRPC call: /csi.v1.Identity/Probe
I0318 09:39:59.917279    2660 utils.go:120] ID: 1750 GRPC request: {}
I0318 09:39:59.918278    2660 utils.go:125] ID: 1750 GRPC response: {}
I0318 09:40:59.917191    2660 utils.go:119] ID: 1751 GRPC call: /csi.v1.Identity/Probe
I0318 09:40:59.917221    2660 utils.go:120] ID: 1751 GRPC request: {}
I0318 09:40:59.918284    2660 utils.go:125] ID: 1751 GRPC response: {}
I0318 09:41:59.917348    2660 utils.go:119] ID: 1752 GRPC call: /csi.v1.Identity/Probe
I0318 09:41:59.917383    2660 utils.go:120] ID: 1752 GRPC request: {}
I0318 09:41:59.918395    2660 utils.go:125] ID: 1752 GRPC response: {}
I0318 09:42:59.917203    2660 utils.go:119] ID: 1753 GRPC call: /csi.v1.Identity/Probe
I0318 09:42:59.917233    2660 utils.go:120] ID: 1753 GRPC request: {}
I0318 09:42:59.918099    2660 utils.go:125] ID: 1753 GRPC response: {}
I0318 09:43:59.916986    2660 utils.go:119] ID: 1754 GRPC call: /csi.v1.Identity/Probe
I0318 09:43:59.917006    2660 utils.go:120] ID: 1754 GRPC request: {}
I0318 09:43:59.917661    2660 utils.go:125] ID: 1754 GRPC response: {}
I0318 09:44:59.917809    2660 utils.go:119] ID: 1755 GRPC call: /csi.v1.Identity/Probe
I0318 09:44:59.917844    2660 utils.go:120] ID: 1755 GRPC request: {}
I0318 09:44:59.919301    2660 utils.go:125] ID: 1755 GRPC response: {}
I0318 09:45:59.917329    2660 utils.go:119] ID: 1756 GRPC call: /csi.v1.Identity/Probe
I0318 09:45:59.917360    2660 utils.go:120] ID: 1756 GRPC request: {}
I0318 09:45:59.918262    2660 utils.go:125] ID: 1756 GRPC response: {}
I0318 09:46:59.917283    2660 utils.go:119] ID: 1757 GRPC call: /csi.v1.Identity/Probe
I0318 09:46:59.917315    2660 utils.go:120] ID: 1757 GRPC request: {}
I0318 09:46:59.918358    2660 utils.go:125] ID: 1757 GRPC response: {}
I0318 09:47:59.917412    2660 utils.go:119] ID: 1758 GRPC call: /csi.v1.Identity/Probe
I0318 09:47:59.917449    2660 utils.go:120] ID: 1758 GRPC request: {}
I0318 09:47:59.918518    2660 utils.go:125] ID: 1758 GRPC response: {}
I0318 09:48:59.917365    2660 utils.go:119] ID: 1759 GRPC call: /csi.v1.Identity/Probe
I0318 09:48:59.917405    2660 utils.go:120] ID: 1759 GRPC request: {}
I0318 09:48:59.918444    2660 utils.go:125] ID: 1759 GRPC response: {}
I0318 09:49:59.917308    2660 utils.go:119] ID: 1760 GRPC call: /csi.v1.Identity/Probe
I0318 09:49:59.917343    2660 utils.go:120] ID: 1760 GRPC request: {}
I0318 09:49:59.918408    2660 utils.go:125] ID: 1760 GRPC response: {}
I0318 09:50:59.917232    2660 utils.go:119] ID: 1761 GRPC call: /csi.v1.Identity/Probe
I0318 09:50:59.917261    2660 utils.go:120] ID: 1761 GRPC request: {}
I0318 09:50:59.918099    2660 utils.go:125] ID: 1761 GRPC response: {}
I0318 09:51:59.917583    2660 utils.go:119] ID: 1762 GRPC call: /csi.v1.Identity/Probe
I0318 09:51:59.917613    2660 utils.go:120] ID: 1762 GRPC request: {}
I0318 09:51:59.918522    2660 utils.go:125] ID: 1762 GRPC response: {}
I0318 09:52:59.917242    2660 utils.go:119] ID: 1763 GRPC call: /csi.v1.Identity/Probe
I0318 09:52:59.917280    2660 utils.go:120] ID: 1763 GRPC request: {}
I0318 09:52:59.918385    2660 utils.go:125] ID: 1763 GRPC response: {}
I0318 09:53:59.917321    2660 utils.go:119] ID: 1764 GRPC call: /csi.v1.Identity/Probe
I0318 09:53:59.917353    2660 utils.go:120] ID: 1764 GRPC request: {}
I0318 09:53:59.918273    2660 utils.go:125] ID: 1764 GRPC response: {}
I0318 09:54:59.917244    2660 utils.go:119] ID: 1765 GRPC call: /csi.v1.Identity/Probe
I0318 09:54:59.917276    2660 utils.go:120] ID: 1765 GRPC request: {}
I0318 09:54:59.918233    2660 utils.go:125] ID: 1765 GRPC response: {}
I0318 09:55:59.917126    2660 utils.go:119] ID: 1766 GRPC call: /csi.v1.Identity/Probe
I0318 09:55:59.917153    2660 utils.go:120] ID: 1766 GRPC request: {}
I0318 09:55:59.917926    2660 utils.go:125] ID: 1766 GRPC response: {}
I0318 09:56:59.917035    2660 utils.go:119] ID: 1767 GRPC call: /csi.v1.Identity/Probe
I0318 09:56:59.917055    2660 utils.go:120] ID: 1767 GRPC request: {}
I0318 09:56:59.917606    2660 utils.go:125] ID: 1767 GRPC response: {}
I0318 09:57:59.917256    2660 utils.go:119] ID: 1768 GRPC call: /csi.v1.Identity/Probe
I0318 09:57:59.917292    2660 utils.go:120] ID: 1768 GRPC request: {}
I0318 09:57:59.918285    2660 utils.go:125] ID: 1768 GRPC response: {}
I0318 09:58:59.917357    2660 utils.go:119] ID: 1769 GRPC call: /csi.v1.Identity/Probe
I0318 09:58:59.917390    2660 utils.go:120] ID: 1769 GRPC request: {}
I0318 09:58:59.918434    2660 utils.go:125] ID: 1769 GRPC response: {}
I0318 09:59:59.917290    2660 utils.go:119] ID: 1770 GRPC call: /csi.v1.Identity/Probe
I0318 09:59:59.917322    2660 utils.go:120] ID: 1770 GRPC request: {}
I0318 09:59:59.918312    2660 utils.go:125] ID: 1770 GRPC response: {}
I0318 10:00:59.917110    2660 utils.go:119] ID: 1771 GRPC call: /csi.v1.Identity/Probe
I0318 10:00:59.917136    2660 utils.go:120] ID: 1771 GRPC request: {}
I0318 10:00:59.917961    2660 utils.go:125] ID: 1771 GRPC response: {}
I0318 10:01:59.917368    2660 utils.go:119] ID: 1772 GRPC call: /csi.v1.Identity/Probe
I0318 10:01:59.917431    2660 utils.go:120] ID: 1772 GRPC request: {}
I0318 10:01:59.918641    2660 utils.go:125] ID: 1772 GRPC response: {}
I0318 10:02:59.917167    2660 utils.go:119] ID: 1773 GRPC call: /csi.v1.Identity/Probe
I0318 10:02:59.917194    2660 utils.go:120] ID: 1773 GRPC request: {}
I0318 10:02:59.918186    2660 utils.go:125] ID: 1773 GRPC response: {}
I0318 10:03:59.917120    2660 utils.go:119] ID: 1774 GRPC call: /csi.v1.Identity/Probe
I0318 10:03:59.917150    2660 utils.go:120] ID: 1774 GRPC request: {}
I0318 10:03:59.918354    2660 utils.go:125] ID: 1774 GRPC response: {}
I0318 10:04:59.917408    2660 utils.go:119] ID: 1775 GRPC call: /csi.v1.Identity/Probe
I0318 10:04:59.917447    2660 utils.go:120] ID: 1775 GRPC request: {}
I0318 10:04:59.918502    2660 utils.go:125] ID: 1775 GRPC response: {}
I0318 10:05:59.917239    2660 utils.go:119] ID: 1776 GRPC call: /csi.v1.Identity/Probe
I0318 10:05:59.917272    2660 utils.go:120] ID: 1776 GRPC request: {}
I0318 10:05:59.918277    2660 utils.go:125] ID: 1776 GRPC response: {}
I0318 10:06:59.917155    2660 utils.go:119] ID: 1777 GRPC call: /csi.v1.Identity/Probe
I0318 10:06:59.917191    2660 utils.go:120] ID: 1777 GRPC request: {}
I0318 10:06:59.918029    2660 utils.go:125] ID: 1777 GRPC response: {}
Madhu-1 commented 4 years ago

Please attach the complete logs as I need to analyse where it is exactly failing

NicolaiSchmid commented 4 years ago

Of course

NicolaiSchmid commented 4 years ago

rbd.log

NicolaiSchmid commented 4 years ago

Some additional information, which may help: node3 ist the only node where this issue occurs. node1,2 are working appropriately. node3 is at a different physical location

Pivert commented 4 years ago

Hi, I'm having exactly the same problem (Ubuntu 18.04 on LXC privileged container) . Did you find a solution ?

Madhu-1 commented 4 years ago

Can you please attach complete csi-rbdplugin logs from the node where mount is failing

jeonggyuchoi commented 4 years ago

Hi, I have same messages. Env :
k8s : v1.19.0-alpha.0.180+7de113986ffafb rook-ceph csi-rbdplugin : version 1.2.6 official kernel : CentOS Linux 7 (Core) 3.10.0-1062.9.1.el7.x86_64 docker://19.3.8

I create pool with cluster/examples/kubernetes/ceph/csi/rbd/storageclass-ec.yaml I create pvc with

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: xxx-mysql-pv-claim2
  namespace: xxx
  labels:
    app: xxx-mysql
spec:
  storageClassName: rook-ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 300Gi

I deploy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
  namespace: xxx
  labels:
    app: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - env:
        - name: MYSQL_DATABASE
          value: defaultdb
        - name: MYSQL_PASSWORD
          value: password
        - name: MYSQL_ROOT_PASSWORD
          value: root
        - name: MYSQL_USER
          value: admin
        image: mysql:5.7.12
        imagePullPolicy: IfNotPresent
        name: mysql
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-xxx-pvstg2
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-xxx-pvstg2
        persistentVolumeClaim:
          claimName: xxx-mysql-pv-claim2

I can't find the reason. Plz help

csi-rbdplugin_20200407.log

Madhu-1 commented 4 years ago

Hi, I have same messages. Env : k8s : v1.19.0-alpha.0.180+7de113986ffafb rook-ceph csi-rbdplugin : version 1.2.6 official kernel : CentOS Linux 7 (Core) 3.10.0-1062.9.1.el7.x86_64 docker://19.3.8

I create pool with cluster/examples/kubernetes/ceph/csi/rbd/storageclass-ec.yaml I create pvc with

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: xxx-mysql-pv-claim2
  namespace: xxx
  labels:
    app: xxx-mysql
spec:
  storageClassName: rook-ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 300Gi

I deploy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
  namespace: xxx
  labels:
    app: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - env:
        - name: MYSQL_DATABASE
          value: defaultdb
        - name: MYSQL_PASSWORD
          value: password
        - name: MYSQL_ROOT_PASSWORD
          value: root
        - name: MYSQL_USER
          value: admin
        image: mysql:5.7.12
        imagePullPolicy: IfNotPresent
        name: mysql
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-xxx-pvstg2
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-xxx-pvstg2
        persistentVolumeClaim:
          claimName: xxx-mysql-pv-claim2

I can't find the reason. Plz help

csi-rbdplugin_20200407.log

could be a network issue, can you exec inside the csi-rbdplugin container and check any rbd commands are struck with ps -ef |grep rbd ?

the above provided logs are not complete

jeonggyuchoi commented 4 years ago

 Thank you Madhu-1

k -n rook-ceph exec -it csi-rbdplugin-jqfbj -c csi-rbdplugin -- /bin/sh sh-4.2# ps -ef | grep rbd root 556 23888 0 04:25 ? 00:00:00 rbd status csi-vol-a2901e37-7811-11ea-951a-52df191a1831 --pool xxx-rpl-mdpool -m 10.231.16.39:6789,10.231.214.24:6789,10.231.213.231:6789 --id csi-rbd-node --keyfile=/tmp/csi/keys/keyfile-687222524 root 18562 18464 0 04:59 pts/0 00:00:00 grep rbd root 23833 23813 0 Mar24 ? 00:00:02 /csi-node-driver-registrar --v=5 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock root 23888 23851 0 Mar24 ? 00:01:42 /usr/local/bin/cephcsi --nodeid=xxx-worker01 --endpoint=unix:///csi/csi.sock --v=5 --type=rbd --nodeserver=true --drivername=rook-ceph.rbd.csi.ceph.com --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true sh-4.2# date Tue Apr 7 05:01:25 UTC 2020 sh-4.2# rbd ls unable to get monitor info from DNS SRV with service name: ceph-mon 2020-04-07 05:04:29.825 7f7601a6cb00 -1 failed for service _ceph-mon._tcp 2020-04-07 05:04:29.825 7f7601a6cb00 -1 monclient: get_monmap_and_config cannot identify monitors to contact rbd: couldn't connect to the cluster! rbd: listing images failed: (2) No such file or directory

Your right something wrong.

I checked the ping to mon pod , it's OK sh-4.2# ping 10.244.22.29 PING 10.244.22.29 (10.244.22.29) 56(84) bytes of data. 64 bytes from 10.244.22.29: icmp_seq=1 ttl=63 time=0.276 ms 64 bytes from 10.244.22.29: icmp_seq=2 ttl=63 time=0.247 ms

Could you recommand another check?

Madhu-1 commented 4 years ago

maybe you need to configure the firewall for 6789 on the host machine? cephcsi daemonset runs with host network

jeonggyuchoi commented 4 years ago

Thank you Madhu-1 I don't enabled the firewalld. so it was dead stat.

I check the mon pod and 3300 port used to connect

rook-ceph-mon-a-f4779c9cb-vpb7k /]# ss Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10.244.22.29:3300 10.244.34.53:43618 tcp ESTAB 0 0 10.244.22.29:3300 10.244.36.52:59454 tcp ESTAB 0 0 10.244.22.29:3300 10.244.37.56:52910 tcp ESTAB 0 0 10.244.22.29:3300 10.244.22.31:51614 tcp ESTAB 0 0 10.244.22.29:3300 10.244.36.54:58590 tcp ESTAB 0 0 10.244.22.29:3300 10.244.22.31:51624 tcp ESTAB 0 0 10.244.22.29:3300 10.244.22.31:51646 tcp ESTAB 0 0 10.244.22.29:3300 10.244.34.57:54934 tcp ESTAB 0 0 10.244.22.29:3300 10.244.22.31:56112 tcp ESTAB 0 0 10.244.22.29:3300 ..... tcp ESTAB 0 0 10.244.22.29:49318 10.244.22.31:6800 tcp ESTAB 0 0 10.244.22.29:3300 10.244.38.57:53952 tcp ESTAB 0 80 10.244.22.29:3300 10.244.21.35:42022 tcp ESTAB 0 0 10.244.22.29:3300 10.244.32.57:50232

mon pod is just listen 3300 // Something strange!! first i started mon, every node was ok to connect with 6789 port . Now It doesn't work.

and only 10.244.X.X network ( pod network) is work , maybe in cluster config hostnetwork: true is not adapted... ( I was applied already )

shaekhhasanshoron commented 3 years ago

Hellow, I was working with snapshotting and restoring PVC from the snapshot. I was facing #4624 issue and got a solution. So as suggested I have uncommented all those with ROOKCSI*_IMAGE lines in the operator descriptor and deployed again. Now my PVC pending issue was solved but encountered with a new issue. Whenever I am creating a Statefulset, new PVC and Pv are generating successfully but pods stay in ContainerCreating state. While describing the pods got the following issue. Can anyone please help me. I am using rook v1.2.7 and k8 v1.17.1.

MountVolume.MountDevice failed for volume "pvc-dc3e1023-597e-444c-b3c1-e7e44bfcf01e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-21255c3a-b850-11ea-b60c-e218d8db998c already exists
tivizi commented 3 years ago

I read kubelet's log and solved this problem.

there some logs like the one below :

orphaned pod "ce9d44e6-4643-4bd9-9f12-24a7568463a8" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.

then

rm /var/lib/kubelet/pods/d1136e6d-a113-4c55-a827-d641f56020e2 -rf

problem solved!

Warning: That's true! rm -rf sometimes removes persistent data if that umount not executed. By the way: My first intention was to describe the problem. Sorry for all...

solovyevt commented 3 years ago

Here's related Kubernetes issue: https://github.com/kubernetes/kubernetes/issues/60987

haisum commented 3 years ago

@Madhu-1 Here are outputs of ps on rbdplugin from two nodes with stuck pods:

[AzureUser@dev dev]$ kubectl exec -it csi-rbdplugin-5gg9z -n rook-ceph -c csi-rbdplugin -- /bin/bash
bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory
[root@abc-offline-3 /]# ps -ef |grep rbd
root      11557  11468  0 20:33 pts/0    00:00:00 grep --color=auto rbd
root      44141      2  0 19:17 ?        00:00:00 [rbd]
root      44149      2  0 19:17 ?        00:00:00 [rbd0-tasks]
root      76837      2  0 19:42 ?        00:00:00 [jbd2/rbd0-8]
nfsnobo+  91360  91323  0 19:51 ?        00:00:00 /bin/node_exporter --web.listen-address=0.0.0.0:9796 --path.procfs=/host/proc --path.sysfs=/host/sys --path.rootfs=/host --collector.arp --collector.bcache --collector.bonding --no-collector.buddyinfo --collector.conntrack --collector.cpu --collector.diskstats --no-collector.drbd --collector.edac --collector.entropy --collector.filefd --collector.filesystem --collector.hwmon --collector.infiniband --no-collector.interrupts --collector.ipvs --no-collector.ksmd --collector.loadavg --no-collector.logind --collector.mdadm --collector.meminfo --no-collector.meminfo_numa --no-collector.mountstats --collector.netdev --collector.netstat --collector.nfs --collector.nfsd --no-collector.ntp --no-collector.processes --no-collector.qdisc --no-collector.runit --collector.sockstat --collector.stat --no-collector.supervisord --no-collector.systemd --no-collector.tcpstat --collector.textfile --collector.time --collector.timex --collector.uname --collector.vmstat --no-collector.wifi --collector.xfs --collector.zfs
root      92700      2  0 19:52 ?        00:00:00 [rbd1-tasks]
root     120530 120510  0 18:28 ?        00:00:00 /csi-node-driver-registrar --v=0 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock
root     120646 120629  0 18:29 ?        00:00:01 /usr/local/bin/cephcsi --nodeid=10.1.0.7 --endpoint=unix:///csi/csi.sock --v=0 --type=rbd --nodeserver=true --drivername=rook-ceph.rbd.csi.ceph.com --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true
root     121103 121085  0 18:29 ?        00:00:01 /usr/local/bin/cephcsi --nodeid=10.1.0.7 --endpoint=unix:///csi/csi-provisioner.sock --v=0 --type=rbd --controllerserver=true --drivername=rook-ceph.rbd.csi.ceph.com --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true
root     128623 120646  0 20:22 ?        00:00:00 mkfs.ext4 -m0 -Enodiscard,lazy_itable_init=1,lazy_journal_init=1 /dev/rbd1
---------------------------------
[root@abc-offline-2 /]# ps -ef |grep rbd
root      89284      2  0 21:16 ?        00:00:00 [rbd0-tasks]
root      89323 112576  0 21:16 ?        00:00:00 mkfs.ext4 -m0 -Enodiscard,lazy_itable_init=1,lazy_journal_init=1 /dev/rbd0
root      89776 112576  0 21:16 ?        00:00:00 rados -m 10.43.1.246:6789,10.43.1.221:6789,10.43.1.144:6789 --id csi-rbd-node --keyfile=/tmp/csi/keys/keyfile-478898634 -c /etc/ceph/ceph.conf -p replicapool getomapval csi.volume.d8cbd843-e0ce-11ea-a151-7266822d9274 csi.volname /tmp/omap-get-422460065
nfsnobo+  91635  91608  0 19:51 ?        00:00:02 /bin/node_exporter --web.listen-address=0.0.0.0:9796 --path.procfs=/host/proc --path.sysfs=/host/sys --path.rootfs=/host --collector.arp --collector.bcache --collector.bonding --no-collector.buddyinfo --collector.conntrack --collector.cpu --collector.diskstats --no-collector.drbd --collector.edac --collector.entropy --collector.filefd --collector.filesystem --collector.hwmon --collector.infiniband --no-collector.interrupts --collector.ipvs --no-collector.ksmd --collector.loadavg --no-collector.logind --collector.mdadm --collector.meminfo --no-collector.meminfo_numa --no-collector.mountstats --collector.netdev --collector.netstat --collector.nfs --collector.nfsd --no-collector.ntp --no-collector.processes --no-collector.qdisc --no-collector.runit --collector.sockstat --collector.stat --no-collector.supervisord --no-collector.systemd --no-collector.tcpstat --collector.textfile --collector.time --collector.timex --collector.uname --collector.vmstat --no-collector.wifi --collector.xfs --collector.zfs
root      93236      2  0 19:52 ?        00:00:00 [rbd]
root      95135  94912  0 21:21 pts/0    00:00:00 grep --color=auto rbd
root     112387 112363  0 18:28 ?        00:00:00 /csi-node-driver-registrar --v=0 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/rook-ceph.rbd.csi.ceph.com/csi.sock
root     112576 112552  0 18:29 ?        00:00:01 /usr/local/bin/cephcsi --nodeid=10.1.0.6 --endpoint=unix:///csi/csi.sock --v=0 --type=rbd --nodeserver=true --drivername=rook-ceph.rbd.csi.ceph.com --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true

Here are pod logs:

[AzureUser@abc-online ~]$ kubectl logs csi-rbdplugin-9l568 -c csi-rbdplugin -n rook-ceph
W0817 21:52:14.068655    1412 driver.go:166] EnableGRPCMetrics is deprecated
W0817 21:54:57.216075    1412 rbd_util.go:234] ID: 6 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 rbd: no watchers on csi-vol-3f9a3312-e0d4-11ea-a151-7266822d9274
E0817 21:56:56.891558    1412 nodeserver.go:102] ID: 10 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:56:56.891599    1412 utils.go:163] ID: 10 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:56:58.011249    1412 nodeserver.go:102] ID: 12 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:56:58.011300    1412 utils.go:163] ID: 12 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:00.106748    1412 nodeserver.go:102] ID: 14 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:00.106793    1412 utils.go:163] ID: 14 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:04.124707    1412 nodeserver.go:102] ID: 16 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:04.124752    1412 utils.go:163] ID: 16 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:12.167004    1412 nodeserver.go:102] ID: 18 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:12.167054    1412 utils.go:163] ID: 18 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:28.249054    1412 nodeserver.go:102] ID: 21 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:57:28.249094    1412 utils.go:163] ID: 21 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:58:00.303329    1412 nodeserver.go:102] ID: 23 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:58:00.303447    1412 utils.go:163] ID: 23 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:59:04.397722    1412 nodeserver.go:102] ID: 26 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 21:59:04.397766    1412 utils.go:163] ID: 26 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 22:01:06.460178    1412 nodeserver.go:102] ID: 30 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 22:01:06.460223    1412 utils.go:163] ID: 30 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 22:03:08.565360    1412 nodeserver.go:102] ID: 34 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists
E0817 22:03:08.565421    1412 utils.go:163] ID: 34 Req-ID: 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3f9a3312-e0d4-11ea-a151-7266822d9274 already exists

Any idea about which process might be stuck here? I have opened port 6789 on all nodes.

cluster.yml:

#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.

# For example, to create the cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f cluster.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
    # v13 is mimic, v14 is nautilus, and v15 is octopus.
    # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
    # If you want to be more precise, you can always use a timestamp tag such ceph/ceph:v15.2.4-20200630
    # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
    image: arvest-harbor.dev.arvest.com/ceph/ceph:v15.2.4
    # Whether to allow unsupported versions of Ceph. Currently `nautilus` and `octopus` are supported.
    # Future versions such as `pacific` would require this to be set to `true`.
    # Do not set to true in production.
    allowUnsupported: false
  # The path on the host where configuration files will be persisted. Must be specified.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
  dataDirHostPath: /var/lib/rook
  # Whether or not upgrade should continue even if a check fails
  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
  # Use at your OWN risk
  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
  skipUpgradeChecks: false
  # Whether or not continue if PGs are not clean during an upgrade
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  # set the amount of mons to be started
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    modules:
    # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
    # are already enabled by other settings in the cluster CR and the "rook" module is always enabled.
    - name: pg_autoscaler
      enabled: true
  # enable the ceph dashboard for viewing cluster status
  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443
    # serve the dashboard using SSL
    ssl: false
  # enable prometheus alerting for cluster
  monitoring:
    # requires Prometheus to be pre-installed
    enabled: false
    # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
    # Recommended:
    # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
    # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
    # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
    rulesNamespace: rook-ceph
  network:
    # enable host networking
    #provider: host
    # EXPERIMENTAL: enable the Multus network provider
    #provider: multus
    #selectors:
      # The selector keys are required to be `public` and `cluster`.
      # Based on the configuration, the operator will do the following:
      #   1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
      #   2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
      #
      # In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
      #
      #public: public-conf --> NetworkAttachmentDefinition object name in Multus
      #cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
  # enable the crash collector for ceph daemon crash collection
  crashCollector:
    disable: false
  cleanupPolicy:
    # cleanup should only be added to the cluster when the cluster is about to be deleted.
    # After any field of the cleanup policy is set, Rook will stop configuring the cluster as if the cluster is about
    # to be destroyed in order to prevent these settings from being deployed unintentionally.
    # To signify that automatic deletion is desired, use the value "yes-really-destroy-data". Only this and an empty
    # string are valid values for this field.
    confirmation: ""
    # sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
    sanitizeDisks:
      # method indicates if the entire disk should be sanitized or simply ceph's metadata
      # in both case, re-install is possible
      # possible choices are 'complete' or 'quick' (default)
      method: quick
      # dataSource indicate where to get random bytes from to write on the disk
      # possible choices are 'zero' (default) or 'random'
      # using random sources will consume entropy from the system and will take much more time then the zero source
      dataSource: zero
      # iteration overwrite N times instead of the default (1)
      # takes an integer value
      iteration: 1
  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
  # tolerate taints with a key of 'storage-node'.
#  placement:
#    all:
#      nodeAffinity:
#        requiredDuringSchedulingIgnoredDuringExecution:
#          nodeSelectorTerms:
#          - matchExpressions:
#            - key: role
#              operator: In
#              values:
#              - storage-node
#      podAffinity:
#      podAntiAffinity:
#      topologySpreadConstraints:
#      tolerations:
#      - key: storage-node
#        operator: Exists
# The above placement information can also be specified for mon, osd, and mgr components
#    mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
# collocation on the same node. This is a required rule when host network is used
# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
# preferred rule with weight: 50.
#    osd:
#    mgr:
#    cleanup:
  annotations:
#    all:
#    mon:
#    osd:
#    cleanup:
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
#   mgr:
  resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
    # mgr:
    #   limits:
    #     cpu: "500m"
    #     memory: "1024Mi"
    #   requests:
    #     cpu: "500m"
    #     memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
#    mon:
#    osd:
#    prepareosd:
#    crashcollector:
#    cleanup:
  # The option to automatically remove OSDs that are out and are safe to destroy.
  removeOSDsIfOutAndSafeToRemove: false
#  priorityClassNames:
#    all: rook-ceph-default-priority-class
#    mon: rook-ceph-mon-priority-class
#    osd: rook-ceph-osd-priority-class
#    mgr: rook-ceph-mgr-priority-class
  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true
    #deviceFilter:
    config:
      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
      # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
      # osdsPerDevice: "1" # this value can be overridden at the node or device level
      # encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
#    nodes:
#    - name: "172.17.4.201"
#      devices: # specific devices to use for storage can be specified for each node
#      - name: "sdb"
#      - name: "nvme01" # multiple osds can be created on high performance devices
#        config:
#          osdsPerDevice: "5"
#      - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
#      config: # configuration can be specified at the node level which overrides the cluster level config
#        storeType: filestore
#    - name: "172.17.4.301"
#      deviceFilter: "^sd."
  # The section for configuring management of daemon disruptions during upgrade or fencing.
  disruptionManagement:
    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
    # block eviction of OSDs by default and unblock them safely when drains are detected.
    managePodBudgets: false
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
    osdMaintenanceTimeout: 30
    # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
    # Only available on OpenShift.
    manageMachineDisruptionBudgets: false
    # Namespace in which to watch for the MachineDisruptionBudgets.
    machineDisruptionBudgetNamespace: openshift-machine-api

  # healthChecks
  # Valid values for daemons are 'mon', 'osd', 'status'
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    # Change pod liveness probe, it works for all mon,mgr,osd daemons
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

storageclass.yml:

---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID is the namespace where the rook cluster is running
    clusterID: rook-ceph
    # Ceph pool into which the RBD image shall be created
    pool: replicapool

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
    imageFeatures: layering

    # The secrets contain Ceph admin credentials.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

    # Specify the filesystem type of the volume. If not specified, csi-provisioner
    # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
    # in hyperconverged settings where the volume is mounted on the same node as the osds.
    csi.storage.k8s.io/fstype: ext4

# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete
solovyevt commented 3 years ago

Experiencing the same issue on a HEALTH_OK cluster with Rook 1.3.6 and CephFS with CSI, only minor subset of the nodes are affected, here are some details:

First observation

Execution of ps aux | grep ceph in csi-cephfsplugin container on the affected host shows that there's a stuck rados process

# ps aux | grep ceph
root          1  0.0  0.0 4903264 22632 ?       Ssl  Jul23   6:55 /usr/local/bin/cephcsi --nodeid=node0 --type=cephfs --endpoint=unix:///csi/csi.sock --v=3 --nodeserver=true --drivername=rook-ceph.cephfs.csi.ceph.com --metadatastorage=k8s_configmap --mountcachedir=/mount-cache-dir --pidlimit=-1 --metricsport=9091 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true
root        105  0.0  0.4 1762496 276064 ?      Sl   Jul25   1:53 rados -m 10.43.89.46:6789,10.43.88.240:6789,10.43.243.181:6789 --id csi-cephfs-node --keyfile=/tmp/csi/keys/keyfile-744735131 -c /etc/ceph/ceph.conf -p rook-cephfs-metadata getomapval csi.volume.2be35885-cc30-11ea-8ba5-82c2a278b21c csi.volname /tmp/omap-get-616991294 --namespace=csi

netstat shows established TCP conenction to one of the mons on port 3300:

# netstat -atunp | grep -E "10.43.89.46|10.43.88.240|10.43.243.181"
tcp        0      0 10.46.207.124:51336     10.43.88.240:6789       ESTABLISHED -
tcp        0      0 10.46.207.124:35332     10.43.89.46:3300        ESTABLISHED 11680/rados

None of the above is presented on the unaffected nodes.

Most likely this is not a network issue since rados df works well:

# rados -m 10.43.89.46:6789,10.43.88.240:6789,10.43.243.181:6789 --id csi-cephfs-node --keyfile=/tmp/csi/keys/keyfile-744735131 -c /etc/ceph/ceph.conf -p rook-cephfs-metadata df
POOL_NAME               USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS      RD WR_OPS      WR USED COMPR UNDER COMPR
rook-cephfs-metadata 1.4 MiB      62      0    186                  0       0        0    458 458 KiB    966 826 KiB        0 B         0 B

total_objects    139875
total_used       1.4 TiB
total_avail      7.9 TiB
total_space      9.3 TiB

Second observation

strace gives some more information on what happens with this hanging rados process:

# strace -s 99 -ffp 11680
strace: Process 11680 attached with 14 threads
[pid 11702] futex(0x5578943e3bdc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11701] futex(0x5578943e3a84, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11700] futex(0x5578943e37c4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11699] futex(0x5578943e32ec, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...>
[pid 11698] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 11697] futex(0x5578943e4f8c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11696] futex(0x5578943e4df4, FUTEX_WAIT_PRIVATE, 14501, NULL <unfinished ...>
[pid 11695] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 11690] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid 11689] futex(0x55789439c9ac, FUTEX_WAIT_PRIVATE, 546801, NULL <unfinished ...>
[pid 11684] epoll_wait(10,  <unfinished ...>
[pid 11683] epoll_wait(7,  <unfinished ...>
[pid 11682] epoll_wait(4,  <unfinished ...>
[pid 11680] futex(0x7ffc345c3bdc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11695] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 11695] clock_gettime(CLOCK_REALTIME, {tv_sec=1598358141, tv_nsec=833762945}) = 0
[pid 11695] futex(0x5578943ebbc0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11695] clock_gettime(CLOCK_REALTIME, {tv_sec=1598358141, tv_nsec=833877791}) = 0
[pid 11695] futex(0x5578943ebbec, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1092949, {tv_sec=1598358146, tv_nsec=833877791}, 0xffffffff <unfinished ...>
[pid 11690] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 11690] clock_gettime(CLOCK_REALTIME, {tv_sec=1598358142, tv_nsec=272963245}) = 0
[pid 11690] futex(0x5578943de290, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11690] clock_gettime(CLOCK_REALTIME, {tv_sec=1598358142, tv_nsec=273074344}) = 0
[pid 11690] futex(0x5578943de2bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1056053, {tv_sec=1598358147, tv_nsec=273074344}, 0xffffffff <unfinished ...>
[pid 11698] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 11698] futex(0x55789439c9ac, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x55789439c928, 546802) = 1
[pid 11698] write(6, "c", 1)            = 1
[pid 11689] <... futex resumed> )       = 0
[pid 11682] <... epoll_wait resumed> [{EPOLLIN, {u32=5, u64=5}}], 5000, 30000) = 1
[pid 11698] futex(0x5578943e31b0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 11689] futex(0x55789439c928, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 11698] <... futex resumed> )       = 0
[pid 11689] <... futex resumed> )       = 0
[pid 11698] futex(0x5578943e320c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 528031, {tv_sec=1598358153, tv_nsec=205204617}, 0xffffffff <unfinished ...>
[pid 11689] futex(0x55789439c9ac, FUTEX_WAIT_PRIVATE, 546803, NULL <unfinished ...>
[pid 11682] read(5, "c", 256)           = 1
[pid 11682] read(5, 0x7f6c8b4549c0, 256) = -1 EAGAIN (Resource temporarily unavailable)
[pid 11682] sendmsg(15, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="`\255\335I1\236z\373\337R\340\212>\20\257\31\270\230\262\27\366\354\267Tn\363\234S\303\227\253x\267 \347\307^\304\367\345\234\20S\37\246\231\223^\6\232\262\364\1&Y\3641|\202\330\364\322\242@", iov_len=64}, {iov_base="\226}c\335\260$RQ\203&c\237\27\362\305c", iov_len=16}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 80
[pid 11682] epoll_wait(4, [{EPOLLIN, {u32=15, u64=15}}], 5000, 30000) = 1
[pid 11682] read(15, "\v\216\317`_\374\377\307\371\301p\

These messages are essential:

[pid 11698] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
...
[pid 11682] read(5, 0x7f6c8b4549c0, 256) = -1 EAGAIN (Resource temporarily unavailable)
solovyevt commented 3 years ago

Also, a disclaimer: be VERY careful when running what @tivizi has suggested, volumes might still be mapped to the disk and forced rm might destroy the contents.

KaelBaldwin commented 3 years ago

I had what I think was the same issue as this. I'm not sure the cause, but I deleted each pod of the csi-rbdplugin deamonset one at a time, waiting until the previous one had come back up. The first one deleted and restarted fine, the second one hung when trying to delete. I drained the node that pod was running on, and during the drain everything started working again. I'm guessing a pod with a volume was stuck trying to be attached and once it was evicted everything stopped hanging.

SerialVelocity commented 3 years ago

For me I hit this and it was because two of my cephfs mds' weren't registered properly:

# ceph mds metadata cephfs-storage0-a
Error ENOENT:
# ceph mds metadata cephfs-storage3-a
Error ENOENT:

My little script for checking since I run multiple cephfs' (replace $(seq 0 5) and cephfs-storage$m with your mds names):

for m in $(seq 0 5); do for n in a b; do echo -n "cephfs-storage$m-$n: "; ceph mds metadata cephfs-storage$m-$n 2>&1 | grep -q ceph_release && echo "Success" || echo "Failed"; done; done

(or this could've been unrelated as I still can only get one cephfs pvc to provision out of three... It takes 2m10 and the context deadline timeout happens at 2m30 so it could be because this is just a slow operation...)

sfxworks commented 3 years ago

image I found a rouge fsck on it... (RBD2 is the block in question)

image Appears under ceph-csi as far as the process tree goes...

Meanwhile, the CSI-rbdplugin container on the node is going:

E0907 16:04:06.639701    2604 utils.go:163] ID: 51 Req-ID: 0001-0009-rook-ceph-000000000000000b-c3b05f61-e51d-11ea-9678-0a9d76ac3816 GRPC error: rpc error: code = Internal desc = rbd image replicated-metadata-pool/csi-vol-c3b05f61-e51d-11ea-9678-0a9d76ac3816 is still being used

And I have no further leads.

Rook 1.3 K8s 1.19 Ubuntu 20.04.1 LTS

solovyevt commented 3 years ago

Apparently, this is most likely a network issue probably related to network disruptions/power outages or routing problems. Launching a new Ceph cluster with hostNetwork: true solved this issue, but at a price of manual migration of all existing data.

Do not attempt to enable hostNetwork: true on the existing cluster, it might lead to loss of the data and cluster disruption.

This issue should remain open.

ilhaan commented 3 years ago

Seeing this same issue with hostNetwork: true set in cluster-external.yaml and operator.yaml

Rook 1.4.6, k8s 1.16.8 and Ubuntu 18.04.4

mleklund commented 3 years ago

I think I am hitting this too.

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Warning  FailedScheduling        21m                  default-scheduler        persistentvolumeclaim "user-data-mysql-claim-qa" not found
  Warning  FailedScheduling        21m                  default-scheduler        0/6 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu, 3 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               21m                  default-scheduler        Successfully assigned mysql/user-data-mysql-qa-0 to k8w01.hsv.dealnews.net
  Warning  FailedScheduling        21m                  default-scheduler        persistentvolumeclaim "user-data-mysql-claim-qa" not found
  Normal   SuccessfulAttachVolume  21m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-ff9f5ec0-0626-44b3-8b62-c3f12233179e"
  Warning  FailedMount             19m                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[persistent-storage config-volume sql-initialize-volume tz-config default-token-ttvk8]: timed out waiting for the condition
  Warning  FailedMount             19m                  kubelet                  MountVolume.MountDevice failed for volume "pvc-ff9f5ec0-0626-44b3-8b62-c3f12233179e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             14m                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[tz-config default-token-ttvk8 persistent-storage config-volume sql-initialize-volume]: timed out waiting for the condition
  Warning  FailedMount             10m                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[default-token-ttvk8 persistent-storage config-volume sql-initialize-volume tz-config]: timed out waiting for the condition
  Warning  FailedMount             9m1s (x12 over 19m)  kubelet                  MountVolume.MountDevice failed for volume "pvc-ff9f5ec0-0626-44b3-8b62-c3f12233179e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-082430a5-298c-11eb-bb94-a6cc32098c46 already exists
  Warning  FailedMount             8m9s (x3 over 17m)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[sql-initialize-volume tz-config default-token-ttvk8 persistent-storage config-volume]: timed out waiting for the condition
  Normal   Pulled                  6m58s                kubelet                  Container image "percona/percona-xtradb-cluster:5.7" already present on machine
  Normal   Created                 6m58s                kubelet                  Created container mysql
  Normal   Started                 6m57s                kubelet                  Started container mysql

It eventually mounts but takes forever to get there.

hedasaurabh commented 3 years ago

I am hitting the same issue:

Events:
  Type     Reason       Age                    From                    Message
  ----     ------       ----                   ----                    -------
  Normal   Scheduled    <unknown>              default-scheduler       Successfully assigned default/jenkins-58b8d7456d-wrls5 to 10.128.229.32
  Warning  FailedMount  2m27s (x2 over 4m41s)  kubelet, 10.128.229.32  Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[jenkins-home docker-sock-volume jenkins-token-67t4h]: timed out waiting for the condition
  Warning  FailedMount  67s (x3 over 5m12s)    kubelet, 10.128.229.32  MountVolume.MountDevice failed for volume "pvc-de217c4d-6430-4286-a64b-acdd9dcccf0e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-cd014afd-33d4-11eb-a55b-92eb9ddaf752 already exists
  Warning  FailedMount  10s                    kubelet, 10.128.229.32  Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[jenkins-token-67t4h jenkins-home docker-sock-volume]: timed out waiting for the condition
ilhaan commented 3 years ago

In case this helps someone else - the reason I saw FailedMount errors was because Rook was unable to mount rbd devices to /dev on worker nodes where pods were scheduled. I had to target specific worker nodes where /dev was writable by k8s in order to get past this issue.

mleklund commented 3 years ago

my kubelets are running as root, with /dev having drwxr-xr-x. 17 root root 3700 Nov 25 22:34 dev so this is not the issue in my case.

nirav-chotai commented 3 years ago

Hitting the same issue

Events:
  Type     Reason                Age                  From                                                                                                              Message
  ----     ------                ----                 ----                                                                                                              -------
  Warning  ProvisioningFailed    35s                  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  10s (x14 over 3m5s)  persistentvolume-controller                                                                                       waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          4s (x7 over 3m5s)    rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
  Warning  ProvisioningFailed    4s (x6 over 35s)     rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

Rook operator logs

$ kubectl -n rook-ceph logs -f rook-ceph-operator-86d785d4b8-8r7hk | grep cephfs
2020-12-07 08:00:03.661888 I | rookcmd: flag values: --add_dir_header=false, --alsologtostderr=false, --csi-cephfs-plugin-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin.yaml, --csi-cephfs-provisioner-dep-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-dep.yaml, --csi-cephfs-provisioner-sts-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-sts.yaml, --csi-rbd-plugin-template-path=/etc/ceph-csi/rbd/csi-rbdplugin.yaml, --csi-rbd-provisioner-dep-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-dep.yaml, --csi-rbd-provisioner-sts-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-sts.yaml, --enable-discovery-daemon=true, --enable-flex-driver=false, --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-flush-frequency=5s, --log-level=INFO, --log_backtrace_at=:0, --log_dir=, --log_file=, --log_file_max_size=1800, --logtostderr=true, --master=, --mon-healthcheck-interval=45s, --mon-out-timeout=10m0s, --operator-image=, --service-account=, --skip_headers=false, --skip_log_headers=false, --stderrthreshold=2, --v=0, --vmodule=
2020-12-07 08:00:19.722562 I | ceph-csi: CSIDriver CRD already had been registered for "rook-ceph.cephfs.csi.ceph.com"
2020-12-07 08:00:31.451312 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2020-12-07 08:00:31.757363 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"
2020-12-07 08:01:12.060967 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2020-12-07 08:01:12.351015 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"

csi-cephfsplugin-provisioner -c csi-provisioner logs

I1207 03:42:33.256896       1 csi-provisioner.go:107] Version: v1.6.0-0-g321fa5c1c
I1207 03:42:33.256938       1 csi-provisioner.go:121] Building kube configs for running in cluster...
I1207 03:42:33.265870       1 connection.go:153] Connecting to unix:///csi/csi-provisioner.sock
I1207 03:42:34.266614       1 common.go:111] Probing CSI driver for readiness
W1207 03:42:34.268207       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.
I1207 03:42:34.272734       1 leaderelection.go:242] attempting to acquire leader lease  rook-ceph/rook-ceph-cephfs-csi-ceph-com...
I1207 03:42:53.928225       1 leaderelection.go:252] successfully acquired lease rook-ceph/rook-ceph-cephfs-csi-ceph-com
I1207 03:42:54.028649       1 controller.go:799] Starting provisioner controller rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a!
I1207 03:42:54.028701       1 clone_controller.go:58] Starting CloningProtection controller
I1207 03:42:54.028728       1 volume_store.go:97] Starting save volume queue
I1207 03:42:54.028789       1 clone_controller.go:74] Started CloningProtection controller
I1207 03:42:54.129369       1 controller.go:848] Started provisioner controller rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a!
E1207 05:42:54.037049       1 shared_informer.go:662] unrecognized notification: <nil>
I1207 06:13:44.904433       1 controller.go:1284] provision "dps-sample-microservice-onprem/cephfs-pvc" class "rook-cephfs": started
I1207 06:13:44.908243       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
W1207 06:16:14.912310       1 controller.go:916] Retrying syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", failure 0
E1207 06:16:14.912346       1 controller.go:939] error syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1207 06:16:14.912376       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1207 06:16:14.912438       1 controller.go:1284] provision "dps-sample-microservice-onprem/cephfs-pvc" class "rook-cephfs": started
I1207 06:16:14.916133       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
W1207 06:16:14.922680       1 controller.go:916] Retrying syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", failure 1
E1207 06:16:14.922699       1 controller.go:939] error syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

csi-cephfsplugin-provisioner -c csi-cephfsplugin logs

$ kubectl -n rook-ceph logs -f csi-cephfsplugin-provisioner-598854d87f-jskc5 -c csi-cephfsplugin
W1207 03:42:33.556178       1 driver.go:157] EnableGRPCMetrics is deprecated
E1207 06:13:45.485756       1 omap.go:66] ID: 168 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad omap not found (pool="myfs-metadata", namespace="csi", name="csi.volumes.default"): rados: ret=2, No such file or directory
E1207 06:16:14.922539       1 controllerserver.go:149] ID: 171 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists
E1207 06:16:14.922571       1 utils.go:163] ID: 171 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

Ceph Status

[root@rook-ceph-tools-b5765bff4-ljw7f /]# ceph status
  cluster:
    id:     c40d82d5-3193-457d-a628-a3db67839a37
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,d (age 4h)
    mgr: a(active, since 2h)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 2w), 3 in (since 2w)

  task status:
    scrub status:
        mds.myfs-a: idle
        mds.myfs-b: idle

  data:
    pools:   4 pools, 97 pgs
    objects: 95.73k objects, 360 GiB
    usage:   1.0 TiB used, 37 TiB / 38 TiB avail
    pgs:     97 active+clean

  io:
    client:   853 B/s rd, 542 KiB/s wr, 1 op/s rd, 8 op/s wr

[root@rook-ceph-tools-b5765bff4-ljw7f /]#

PVC stuck in the pending state forever

NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cephfs-pvc   Pending                                      rook-cephfs    43m

Rook version: v1.4.3 Ceph version: v15.2.4-20200630

yleiymei commented 3 years ago

How can it be solved? I Hitting the same issue... This is pod Events information :

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   SuccessfulAttachVolume  14m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-4e9b93e1-9b62-4eea-88cd-eb831d6a4b40"
  Warning  FailedMount             14m                  kubelet, 172.31.230.201  MountVolume.MountDevice failed for volume "pvc-4e9b93e1-9b62-4eea-88cd-eb831d6a4b40" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             114s (x13 over 14m)  kubelet, 172.31.230.201  MountVolume.MountDevice failed for volume "pvc-4e9b93e1-9b62-4eea-88cd-eb831d6a4b40" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-c7ab2c03-4ffd-11eb-9378-6a7c54c18a1a already exists
nirav-chotai commented 3 years ago

Hitting the same issue

Events:
  Type     Reason                Age                  From                                                                                                              Message
  ----     ------                ----                 ----                                                                                                              -------
  Warning  ProvisioningFailed    35s                  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  10s (x14 over 3m5s)  persistentvolume-controller                                                                                       waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          4s (x7 over 3m5s)    rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
  Warning  ProvisioningFailed    4s (x6 over 35s)     rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a  failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

Rook operator logs

$ kubectl -n rook-ceph logs -f rook-ceph-operator-86d785d4b8-8r7hk | grep cephfs
2020-12-07 08:00:03.661888 I | rookcmd: flag values: --add_dir_header=false, --alsologtostderr=false, --csi-cephfs-plugin-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin.yaml, --csi-cephfs-provisioner-dep-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-dep.yaml, --csi-cephfs-provisioner-sts-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-sts.yaml, --csi-rbd-plugin-template-path=/etc/ceph-csi/rbd/csi-rbdplugin.yaml, --csi-rbd-provisioner-dep-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-dep.yaml, --csi-rbd-provisioner-sts-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-sts.yaml, --enable-discovery-daemon=true, --enable-flex-driver=false, --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-flush-frequency=5s, --log-level=INFO, --log_backtrace_at=:0, --log_dir=, --log_file=, --log_file_max_size=1800, --logtostderr=true, --master=, --mon-healthcheck-interval=45s, --mon-out-timeout=10m0s, --operator-image=, --service-account=, --skip_headers=false, --skip_log_headers=false, --stderrthreshold=2, --v=0, --vmodule=
2020-12-07 08:00:19.722562 I | ceph-csi: CSIDriver CRD already had been registered for "rook-ceph.cephfs.csi.ceph.com"
2020-12-07 08:00:31.451312 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2020-12-07 08:00:31.757363 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"
2020-12-07 08:01:12.060967 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2020-12-07 08:01:12.351015 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"

csi-cephfsplugin-provisioner -c csi-provisioner logs

I1207 03:42:33.256896       1 csi-provisioner.go:107] Version: v1.6.0-0-g321fa5c1c
I1207 03:42:33.256938       1 csi-provisioner.go:121] Building kube configs for running in cluster...
I1207 03:42:33.265870       1 connection.go:153] Connecting to unix:///csi/csi-provisioner.sock
I1207 03:42:34.266614       1 common.go:111] Probing CSI driver for readiness
W1207 03:42:34.268207       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.
I1207 03:42:34.272734       1 leaderelection.go:242] attempting to acquire leader lease  rook-ceph/rook-ceph-cephfs-csi-ceph-com...
I1207 03:42:53.928225       1 leaderelection.go:252] successfully acquired lease rook-ceph/rook-ceph-cephfs-csi-ceph-com
I1207 03:42:54.028649       1 controller.go:799] Starting provisioner controller rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a!
I1207 03:42:54.028701       1 clone_controller.go:58] Starting CloningProtection controller
I1207 03:42:54.028728       1 volume_store.go:97] Starting save volume queue
I1207 03:42:54.028789       1 clone_controller.go:74] Started CloningProtection controller
I1207 03:42:54.129369       1 controller.go:848] Started provisioner controller rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-598854d87f-jskc5_c3095dae-28b8-4585-9c42-cc756cfe832a!
E1207 05:42:54.037049       1 shared_informer.go:662] unrecognized notification: <nil>
I1207 06:13:44.904433       1 controller.go:1284] provision "dps-sample-microservice-onprem/cephfs-pvc" class "rook-cephfs": started
I1207 06:13:44.908243       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
W1207 06:16:14.912310       1 controller.go:916] Retrying syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", failure 0
E1207 06:16:14.912346       1 controller.go:939] error syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1207 06:16:14.912376       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1207 06:16:14.912438       1 controller.go:1284] provision "dps-sample-microservice-onprem/cephfs-pvc" class "rook-cephfs": started
I1207 06:16:14.916133       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dps-sample-microservice-onprem", Name:"cephfs-pvc", UID:"4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", APIVersion:"v1", ResourceVersion:"241767410", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "dps-sample-microservice-onprem/cephfs-pvc"
W1207 06:16:14.922680       1 controller.go:916] Retrying syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a", failure 1
E1207 06:16:14.922699       1 controller.go:939] error syncing claim "4421d7c6-4cd5-4791-aabf-2f7c2ce4b80a": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

csi-cephfsplugin-provisioner -c csi-cephfsplugin logs

$ kubectl -n rook-ceph logs -f csi-cephfsplugin-provisioner-598854d87f-jskc5 -c csi-cephfsplugin
W1207 03:42:33.556178       1 driver.go:157] EnableGRPCMetrics is deprecated
E1207 06:13:45.485756       1 omap.go:66] ID: 168 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad omap not found (pool="myfs-metadata", namespace="csi", name="csi.volumes.default"): rados: ret=2, No such file or directory
E1207 06:16:14.922539       1 controllerserver.go:149] ID: 171 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists
E1207 06:16:14.922571       1 utils.go:163] ID: 171 Req-ID: pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-5fd83ff9-835f-434c-9ccb-97d14298bfad already exists

Ceph Status

[root@rook-ceph-tools-b5765bff4-ljw7f /]# ceph status
  cluster:
    id:     c40d82d5-3193-457d-a628-a3db67839a37
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,d (age 4h)
    mgr: a(active, since 2h)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 2w), 3 in (since 2w)

  task status:
    scrub status:
        mds.myfs-a: idle
        mds.myfs-b: idle

  data:
    pools:   4 pools, 97 pgs
    objects: 95.73k objects, 360 GiB
    usage:   1.0 TiB used, 37 TiB / 38 TiB avail
    pgs:     97 active+clean

  io:
    client:   853 B/s rd, 542 KiB/s wr, 1 op/s rd, 8 op/s wr

[root@rook-ceph-tools-b5765bff4-ljw7f /]#

PVC stuck in the pending state forever

NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cephfs-pvc   Pending                                      rook-cephfs    43m

Rook version: v1.4.3 Ceph version: v15.2.4-20200630

My Issue is resolved after fixing RBAC, the operator was not able to do some operations in a different namespace due to lack of permissions. I am using only one operator to manage 3 Ceph clusters in different namespaces.

SerialVelocity commented 3 years ago

I think there are many things that can cause this error and we need documentation about how to debug and figure out what is causing the issue.

mleklund commented 3 years ago

I worked with someone in slack, and we suspect the issue is kernel related. I have not gotten to the point where I can test a kernel update due to software freezes.

SxunS commented 3 years ago

I think i resolve the problem. Just stop & disable the firewalld systemctl stop firewalld systemctl disable firewalld Now everything be nice.

Hope this could be helpful for someone.

constanine commented 3 years ago

I think i resolve the problem. Just stop & disable the firewalld systemctl stop firewalld systemctl disable firewalld Now everything be nice.

Hope this could be helpful for someone.

same team,my parter means stop and disabled the firewwall both side,not only ceph-server,but also ceph-client,ceph-server disabled firewall will let ceph-rdb bound ok,but when mount pvc ,it will mount failed like this issue if ceph-client has started firewalld

benjaminwinokur commented 3 years ago

I experienced this issue when running on GKE Ubuntu pools on a VPC. The system worked, but when upgrading our nodepool to a new api version we were unable to attach to to our PVCs. Upgrading ceph to v1.5.8 allowed us to reattach our pods to the PVCs. I don't know if this will fix the issue long-term or if it will break again. Its possible, it would have worked if I just patched all of the deployments to trigger redeployments of all of the runtime assets.

alexcpn commented 3 years ago

For me it was a k8s bug

CSI Provisioning logs

I0304 11:24:34.464623       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"moss-elastic", Name:"elasticsearch-data-moss-eck-es-master-0", UID:"2eee53a0-960f-4737-9c0f-f1737788f102", APIVersion:"v1", ResourceVersion:"16949247", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-2eee53a0-960f-4737-9c0f-f1737788f102 already exists
E0304 11:24:34.464657       1 controller.go:939] error syncing claim "2eee53a0-960f-4737-9c0f-f1737788f102": failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-2eee53a0-960f-4737-9c0f-f1737788f102 already exists
I0304 11:24:43.867478       1 controller.go:1284] provision "moss-elastic/elasticsearch-data-moss-eck-es-data-node-0" class "elastic-block": started
I0304 11:24:44.258816       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"moss-elastic", Name:"elasticsearch-data-moss-eck-es-data-node-0", UID:"e8dbbaea-8954-440c-b642-8478e1da35cb", APIVersion:"v1", ResourceVersion:"17869656", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "moss-elastic/elasticsearch-data-moss-eck-es-data-node-0"
W0304 11:24:44.669429       1 controller.go:916] Retrying syncing claim "e8dbbaea-8954-440c-b642-8478e1da35cb", failure 6
E0304 11:24:44.669493       1 controller.go:939] error syncing claim "e8dbbaea-8954-440c-b642-8478e1da35cb": failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-e8dbbaea-8954-440c-b642-8478e1da35cb already exists
I0304 11:24:44.669529       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"moss-elastic", Name:"elasticsearch-data-moss-eck-es-data-node-0", UID:"e8dbbaea-8954-440c-b642-8478e1da35cb", APIVersion:"v1", ResourceVersion:"17869656", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-e8dbbaea-8954-440c-b642-8478e1da35cb already exists

kubelet logs

 E0304 11:33:39.894289    7786 kubelet_volumes.go:154] orphaned pod "4d4ff9c6-a01c-44bc-a05a-e5ac10356ebd" found, but volume paths are still present on disk : There were a total of 4 errors similar to this. Turn up verbosity to see them.
 E0304 11:33:41.897568    7786 kubelet_volumes.go:154] orphaned pod "4d4ff9c6-a01c-44bc-a05a-e5ac10356ebd" found, but volume paths are still present on disk : There were a total of 4 errors similar to this. Turn up verbosity to see them.

Others

before
root@test-alex-job21-sleep-khh5c:/# ls /var/lib/kubelet/pods/10e84bbb-1595-42db-8ee0-1fff289ff064/volumes/kubernetes.io~csi/pvc-460b8fd4-5e20-45ee-8247-c87b2e642f95/
mount
after
root@test-alex-job21-sleep-khh5c:/# ls /var/lib/kubelet/pods/10e84bbb-1595-42db-8ee0-1fff289ff064
ls: cannot access '/var/lib/kubelet/pods/10e84bbb-1595-42db-8ee0-1fff289ff064': No such file or directory

Bug solution - https://github.com/kubernetes/kubernetes/issues/60987#issuecomment-673915505

However in our system we could not do this and restarted all the k8s nodes and then all Pending bound

Events:
  Type     Reason                 Age                   From                                                                                                       Message
  ----     ------                 ----                  ----                                                                                                       -------
  Warning  ProvisioningFailed     35m                   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_2bc1b406-1a0b-463d-bec7-1396003dd3e0  failed to provision volume with StorageClass "elastic-block": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning   18m (x82 over 38m)    persistentvolume-controller                                                                                waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning           17m (x13 over 38m)    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_2bc1b406-1a0b-463d-bec7-1396003dd3e0  External provisioner is provisioning volume for claim "dm-elastic/elasticsearch-data-dm-eck-es-data-node-0"
  Warning  ProvisioningFailed     17m (x12 over 35m)    rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_2bc1b406-1a0b-463d-bec7-1396003dd3e0  failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-dd2837d4-57a7-4b35-9418-0aa0ce65150c already exists
  Warning  ProvisioningFailed     15m                   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_4cb3cb98-3773-44a8-8d4f-2346242a0d41  failed to provision volume with StorageClass "elastic-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-dd2837d4-57a7-4b35-9418-0aa0ce65150c already exists
  Normal   Provisioning           15m (x2 over 15m)     rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_4cb3cb98-3773-44a8-8d4f-2346242a0d41  External provisioner is provisioning volume for claim "dm-elastic/elasticsearch-data-dm-eck-es-data-node-0"
  Normal   Provisioning           5m9s                  rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_00ac711b-04b6-49e9-98e5-b4fb4ec37435  External provisioner is provisioning volume for claim "dm-elastic/elasticsearch-data-dm-eck-es-data-node-0"
  Normal   ExternalProvisioning   4m5s (x3 over 4m35s)  persistentvolume-controller                                                                                waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   ProvisioningSucceeded  4m5s                  rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-569c75558-6jzh5_00ac711b-04b6-49e9-98e5-b4fb4ec37435  Successfully provisioned volume pvc-dd2837d4-57a7-4b35-9418-0aa0ce65150c
ctrahey commented 3 years ago

I am experiencing this today, even after upgrading to 1.20.5 on all nodes in attempt to fix (they added some orphaned pod cleanup to 1.20.5).

See my comment on this ceph issue for more details, I have ruled out (I think) networking issues.

matthew-mcdermott commented 3 years ago

@ctrahey Did you ever get this worked out? For anyone who comes here after upgrading to 1.20.5 and still can't get it to go, here is what it was for me. After I did the upgrade, the orphaned pods were gone. I upgraded rook to v1.6.0 and ceph to v16.2.1. I also upgraded my CNI, which is Calico. The problem changed from "an operation with the given Volume Id already exists" to just timing out while trying to attach existing volumes. The csi-attacher container in the rbd provisioner and the cephfs provisioner both had errors that they did not have permissions to patch volumeattachments/status. Once I updated the permissions, everything came back up.

gemfield commented 3 years ago

strange. ..I resolved this issue by restart mds deployment:

kubectl -n rook-ceph rollout restart deployment rook-ceph-mds-myfs-a
...
lirugo commented 2 years ago

and only 10.244.X.X network ( pod network) is work , maybe in cluster config hostnetwork: true is not adapted... ( I was applied already )

Hi @jeonggyuchoi I have identical issue Could you please provide info how you solve it Thank in advance

zimivanende commented 2 years ago

Any update on this issue?

oliv3340 commented 2 years ago

I had the same issue and I've fixed it by disabling the csi-plugin host network config in the operator's configmap (enabled by default). Then, recreate your pod/pvc from scratch and it should be fine.

CSI_ENABLE_HOST_NETWORK: "false"

Madhu-1 commented 2 years ago

I had the same issue and I've fixed it by disabling the csi-plugin host network config in the operator's configmap (enabled by default). Then, recreate your pod/pvc from scratch and it should be fine.

CSI_ENABLE_HOST_NETWORK: "false"

Running CSI daemonset pods on pod networking is not suggested as it's having another issue.

stephan2012 commented 2 years ago

Running CSI daemonset pods on pod networking is not suggested as it's having another issue.

Unfortunately faced the same issue on a brand new cluster (Kubernetes v1.19.15, Rook v1.7.4, Ceph v16.2.6, Ceph CSI versions bundled with Rook, Ubuntu 18.04.5, Kernel 5.4.0-73-generic, Calico v3.20). Volumes weren’t provisioned until disabling host networking, but I did not find any obvious connectivity issues.

aviorma commented 2 years ago

+1 Unfortunately faced the same issue on a brand new cluster (Kubernetes v1.21.5, Rook v1.7.5, Ceph v16.2.5, Ceph CSI Disabling the host network on the CephCluster solves the issue i tried default kubespray and RKE 1.3.1 Kubernetes installations

network:
  hostNetwork: false
github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

yleiymei commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!

BlaineEXE commented 2 years ago

It seems that users are having to disable host networking to get CSI working with host networking. But as @Madhu-1 pointed out "Running CSI daemonset pods on pod networking is not suggested as it's having another issue." @Madhu-1 would you mind linking or clarifying the issue using pod networking for users?

What are the risks of not using host networking, so users might better make their own decision about the trade-offs when using CNI overlay/pod networking?


For users, I strongly suspect this to be an issue with networking in most cases, and we would like to collect more information about this issue so that we can include helpful information about this in a Rook "common issues" document.

Firstly, it seems that firewalls may play a part for some users: https://github.com/rook/rook/issues/4896#issuecomment-756152009

I don't believe port conflicts are likely to cause this behavior, but I would encourage users to look into the possibility if it isn't a firewall issue. Ceph mons use ports 6789 and 3300.

@NicolaiSchmid reported here (https://github.com/rook/rook/issues/4896#issuecomment-600649666) that their breaking node was physically separate from their working nodes. For this case, I suspect that the node may be unable to reach the Ceph mon services running in Kubernetes. (Firewall configured differently on the node?) So, for all users experiencing the issue, all nodes, check if the node is able to access Kubernetes Service IPs. Do they fail on non-working nodes but succeed on working nodes?

Any information you are able to give around this issue will be helpful for updating the documentation. And thanks to everyone who has suggested fixes and given information about the issue for us.

efenex commented 2 years ago

I appear to have this issue on a newly deployed cluster after wiping it and configuring it with hostNetwork true. The deployed kubernetes cluster is created with kubespray. All 3 worker nodes appear to be able to reach kubernetes services, and volumes fail on each of them (daemonset with volume attached fails to deploy and each volume remains pending).

Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/arm64"}

Using rook helm chart v1.8.3 with ceph image v1.8.3-31.g8fc67f7db

Let me know what debug logs or actions to provide you with to find out more regarding this issue.