rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12k stars 2.65k forks source link

Bug: CSI subvolumegroup is not created by rook automatically #6183

Closed psavva closed 2 years ago

psavva commented 3 years ago

Deviation from expected behavior:

csi driver is not creating the subvolumegroup called csi

Expected behavior: csi driver should create the subvolumegroup called csi

How to reproduce it (minimal and precise):

Create a CephFilesystem with the following: https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/filesystem.yaml

Afterwards, create a storageclass "rook-cephfs" which makes use the filesystem called "myfs" created above. https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/storageclass.yaml

Lastly, create the PVC and test deployment https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/csi/cephfs/kube-registry.yaml

File(s) to submit:

All files as per links above.

Environment:

humblec commented 3 years ago

By default subvolumegroup called csi is created in Ceph CSI : https://github.com/ceph/ceph-csi/blob/44da7ffb4e4d8861265f3ecd421cd06dfce9f34a/internal/cephfs/volume.go#L90

We recently also added support for mulitple subvolume groups : https://github.com/ceph/ceph-csi/pull/1175/files

psavva commented 3 years ago

I will retest on Monday morning with the log outputs

alokhom commented 3 years ago

I am facing the same issue with the cephcluster version image ceph/ceph:v15.2.4. I am consuming ceph octopus externally into openshift 4.4.20. While cephblock pvc works well, cephfs pvc is got this problem (the hostnetwork true/false didnt solve it)

failed to provision volume with StorageClass "rook-cephfs": rpc error: code = InvalidArgument desc = an error occurred while running (2855) ceph [-m 10.101.100.177:6789,10.101.100.175:6789,10.101.100.176:6789 --id csi-cephfs-provisioner --keyfile=***stripped*** -c /etc/ceph/ceph.conf fs get myfs --format=json]: exit status 2: Error ENOENT: filesystem 'myfs' not found
timhughes commented 3 years ago

I think i just ran into the same issue with git clone --single-branch --branch v1.4.4 https://github.com/rook/rook.git

The PVC events were

│   Warning  ProvisioningFailed     25m                   rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7468b6bf56-8np4b_b6b48e17-c648-4784-aa62-bef29106e9b2  failed to provision volume with StorageClass "rook-cephfs": rpc  │
│ error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume create myfs csi-vol-2c5341c9-fe81-11ea-a0e2-6a1d3f228513 1073741824 --group_name csi --mode 777 -m 172.24.158.166:6789,172.24.189.2 │
│ 17:6789,172.24.142.155:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped*** --pool_layout myfs-data0]                                                                                                     │
│   Warning  ProvisioningFailed     24m                   rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7468b6bf56-8np4b_b6b48e17-c648-4784-aa62-bef29106e9b2  failed to provision volume with StorageClass "rook-cephfs": rpc  │
│ error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume create myfs csi-vol-539bcd63-fe81-11ea-a0e2-6a1d3f228513 1073741824 --group_name csi --mode 777 -m 172.24.158.166:6789,172.24.189.2 │
│ 17:6789,172.24.142.155:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped*** --pool_layout myfs-data0]   

So i tracked down the csi-cephfsplugin-provisioner pod and had a look at the logs of csi-cephfsplugin

E0924 16:28:10.055734       1 volume.go:179] ID: 489 Req-ID: pvc-4e13b636-aeb2-4831-9a07-8bb3fab3245b failed to create subvolume csi-vol-eec93592-fe82-11ea-a0e2-6a1d3f228513(an error (exit status 2) occurred while running ceph args: [fs subvolume create myfs csi-vol-eec93592-fe82-11ea-a0e2-6a1d3f228513 1073741824 --group_name csi --mode 777 -m 172.24.158.166:6789,172.24.189.217:6789,172.24.142.155:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped*** --pool_layout myfs-data0]) in fs myfs
E0924 16:28:10.055782       1 controllerserver.go:90] ID: 489 Req-ID: pvc-4e13b636-aeb2-4831-9a07-8bb3fab3245b failed to create volume pvc-4e13b636-aeb2-4831-9a07-8bb3fab3245b: an error (exit status 2) occurred while running ceph args: [fs subvolume create myfs csi-vol-eec93592-fe82-11ea-a0e2-6a1d3f228513 1073741824 --group_name csi --mode 777 -m 172.24.158.166:6789,172.24.189.217:6789,172.24.142.155:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped*** --pool_layout myfs-data0]
E0924 16:28:10.078985       1 utils.go:163] ID: 489 Req-ID: pvc-4e13b636-aeb2-4831-9a07-8bb3fab3245b GRPC error: rpc error: code = Internal desc = an error (exit status 2) occurred while running ceph args: [fs subvolume create myfs csi-vol-eec93592-fe82-11ea-a0e2-6a1d3f228513 1073741824 --group_name csi --mode 777 -m 172.24.158.166:6789,172.24.189.217:6789,172.24.142.155:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped*** --pool_layout myfs-data0]

That didn't help me much but it looked like it was stuck in some way so I killed the csi-cephfsplugin-provisioner pod and when it started again it fixed everything up.

jasonbrooks commented 3 years ago

I just hit this with ceph-rook v1.4.5. I was pulling out my hair trying to figure out what was wrong, and the workaround in https://github.com/rook/rook/issues/4006#issuecomment-675964120 got me unstuck.

stephan2012 commented 3 years ago

Me, too. Setting: Kubernetes 1.15, Rook v1.5.1, Ceph v15.2.6-20201119, Ceph CSI v3.1.2.

The Ceph CSI provisioner logs show

W1202 15:52:55.186209       1 driver.go:157] EnableGRPCMetrics is deprecated
E1202 15:53:22.843769       1 volume.go:109] ID: 4 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 failed to get subvolume info csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 in fs cephfs with Error: an error (exit status 2) and stdError (Error ENOENT: subvolume group 'csi' does not exist
) occurred while running ceph args: [fs subvolume info cephfs csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 --group_name csi -m 10.36.28.166:6789,10.36.27.73:6789,10.36.28.218:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-node --keyfile=***stripped***]. stdError: Error ENOENT: subvolume group 'csi' does not exist
E1202 15:53:22.843856       1 utils.go:163] ID: 4 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 GRPC error: rpc error: code = Internal desc = volume not found
E1202 15:55:26.525559       1 volume.go:109] ID: 8 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 failed to get subvolume info csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 in fs cephfs with Error: an error (exit status 2) and stdError (Error ENOENT: subvolume group 'csi' does not exist
) occurred while running ceph args: [fs subvolume info cephfs csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 --group_name csi -m 10.36.28.166:6789,10.36.27.73:6789,10.36.28.218:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-node --keyfile=***stripped***]. stdError: Error ENOENT: subvolume group 'csi' does not exist
E1202 15:55:26.525658       1 utils.go:163] ID: 8 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 GRPC error: rpc error: code = Internal desc = volume not found
E1202 15:57:30.183963       1 volume.go:109] ID: 12 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 failed to get subvolume info csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 in fs cephfs with Error: an error (exit status 2) and stdError (Error ENOENT: subvolume group 'csi' does not exist
) occurred while running ceph args: [fs subvolume info cephfs csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 --group_name csi -m 10.36.28.166:6789,10.36.27.73:6789,10.36.28.218:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-node --keyfile=***stripped***]. stdError: Error ENOENT: subvolume group 'csi' does not exist
E1202 15:57:30.184021       1 utils.go:163] ID: 12 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 GRPC error: rpc error: code = Internal desc = volume not found
E1202 15:59:33.816252       1 volume.go:109] ID: 16 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 failed to get subvolume info csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 in fs cephfs with Error: an error (exit status 2) and stdError (Error ENOENT: subvolume group 'csi' does not exist
) occurred while running ceph args: [fs subvolume info cephfs csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 --group_name csi -m 10.36.28.166:6789,10.36.27.73:6789,10.36.28.218:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-node --keyfile=***stripped***]. stdError: Error ENOENT: subvolume group 'csi' does not exist
E1202 15:59:33.816321       1 utils.go:163] ID: 16 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 GRPC error: rpc error: code = Internal desc = volume not found
E1202 16:01:37.485937       1 volume.go:109] ID: 20 Req-ID: 0001-0009-rook-ceph-0000000000000001-72d97eab-49ab-11ea-8a6b-42ade5464898 failed to get subvolume info csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 in fs cephfs with Error: an error (exit status 2) and stdError (Error ENOENT: subvolume group 'csi' does not exist
) occurred while running ceph args: [fs subvolume info cephfs csi-vol-72d97eab-49ab-11ea-8a6b-42ade5464898 --group_name csi -m 10.36.28.166:6789,10.36.27.73:6789,10.36.28.218:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-node --keyfile=***stripped***]. stdError: Error ENOENT: subvolume group 'csi' does not exist
E1202 16:01:37.486001       1 utils.go:163] 

and I am wondering if Rook is unable to arrange something, possibly related to these errors:

2020-12-02 16:20:02.545859 I | ceph-spec: ceph-block-pool-controller: CephCluster "rook-ceph" found but skipping reconcile since ceph health is &{"HEALTH_ERR" map["error":{"Urgent" "failed to get status. . Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',): exit status 1"}] "2020-12-02T16:19:00Z" "2020-12-02T16:19:00Z" "HEALTH_OK" {%!q(uint64=762339917824) %!q(uint64=123686912000) %!q(uint64=638653005824) "2020-12-02T16:18:21Z"}}
2020-12-02 16:20:03.235925 I | ceph-spec: ceph-file-controller: CephCluster "rook-ceph" found but skipping reconcile since ceph health is &{"HEALTH_ERR" map["error":{"Urgent" "failed to get status. . Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',): exit status 1"}] "2020-12-02T16:19:00Z" "2020-12-02T16:19:00Z" "HEALTH_OK" {%!q(uint64=762339917824) %!q(uint64=123686912000) %!q(uint64=638653005824) "2020-12-02T16:18:21Z"}}

Even more interesting, the first out of two desired application Pods got the CephFS volume attached and the application started while the other Pod is stuck in ContainerCreating:

0s          Warning   FailedMount               pod/my-application-5f4d4d8f6b-slhnn                       Unable to mount volumes for pod "my-application-5f4d4d8f6b-slhnn_default(0ce97f45-bdee-483f-a458-49fb240ced2a)": timeout expired waiting for volumes to attach or mount for pod "default"/"my-application-5f4d4d8f6b-slhnn". list of unmounted volumes=[vol1 default-token-m5pnx]. list of unattached volumes=[vol1 default-token-m5pnx]
0s          Warning   FailedMount               pod/my-application-5f4d4d8f6b-ztmz2                       MountVolume.MountDevice failed for volume "pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c" : rpc error: code = Internal desc = volume not found

Other thought: Regression in Ceph CSI v3.1.2?

Of course, this issue occurs in a production cluster only, not in any dev or lab cluster …

Please let me know if you thank that this is a different issue.

psavva commented 3 years ago

If you follow this:. https://github.com/rook/rook/issues/4006#issuecomment-593879132

Does the issue go away?

stephan2012 commented 3 years ago

@psavva: No, unfortunately it did not help, I was still facing the problem after deleting PVC, PV, and storage class.

Due to limited time on the production system and the low amount of data, I simply backuped the data (since a single Pod was working, I was able to access the data on the Pod's host), deleted the rook-cephfs storageclass, deleted the Ceph Filesystem, and re-installed everything.

Now, everything works again.

By the way, while reviewing my records, I found this in the logs after kubectl delete pvc:

I1204 09:14:06.372275       1 controller.go:1453] delete "pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c": started
I1204 09:14:07.908046       1 controller.go:1468] delete "pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c": volume deleted
I1204 09:14:07.916878       1 controller.go:1518] delete "pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c": persistentvolume deleted
E1204 09:14:07.916906       1 controller.go:1521] couldn't create key for object pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c: object has no meta: object does not implement the Object interfaces
I1204 09:14:07.916932       1 controller.go:1523] delete "pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c": succeeded
Madhu-1 commented 3 years ago

E1204 09:14:07.916906 1 controller.go:1521] couldn't create key for object pvc-d5fc19a7-8312-4e3f-a870-204156ffa39c: object has no meta: object does not implement the Object interfaces

this error should not cause any issue its a warning, from the logs it looks like PV and backend subvolume is deleted

Madhu-1 commented 3 years ago

subvolume group 'csi' does not exist

@stephan2012 cephcsi will not delete the subvolumegroup once it's created. Not sure how the subvolume group got deleted. can you check ceph fs subvolumegroup ls cephfs on the toolbox pod?

when cephcsi is pod starts it creates the subvolumegroup for the first time and keeps a count in in-memory not to try to create subvolume again, if you find subvolumegroup not found error during pvc creation operation, restarting provisioner pod helps (restarting provisioner pod will create the subvolumegroup again)

stephan2012 commented 3 years ago

@Madhu-1 I will check, but can only do next week.

geerlingguy commented 3 years ago

I'm hitting the same issue (I think... still digging into it) when following the example in the documentation—when I try getting a PVC, the provisioner just gets stuck in this loop:

W1215 05:22:04.418286       1 controller.go:943] Retrying syncing claim "283c0a58-1e41-4d23-b18a-58923f1c7566", failure 6
E1215 05:22:04.418323       1 controller.go:966] error syncing claim "283c0a58-1e41-4d23-b18a-58923f1c7566": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-283c0a58-1e41-4d23-b18a-58923f1c7566 already exists
I1215 05:22:04.422917       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"drupal", Name:"drupal-files-pvc", UID:"283c0a58-1e41-4d23-b18a-58923f1c7566", APIVersion:"v1", ResourceVersion:"4570", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-283c0a58-1e41-4d23-b18a-58923f1c7566 already exists

I'll do some more debugging tomorrow and try to figure out what's going on. Using 1.5 branch latest, currently.

stephan2012 commented 3 years ago

@Madhu-1 I turned out that I cannot access the system affected anymore this year. So, I cannot check ceph fs subvolumegroup ls cephfs for the moment.

Trackhe commented 3 years ago

I have the same problem. Calico 3.14 on arm64 i use kube-proxy in ipvs mode and dualstack

Muyan0828 commented 3 years ago

I think the root case is that your filesystem was recreate, cephfs-csi stores a bool variable in memory for each cluster to mark whether a subvolumegroup has been created, cephfs-csi corresponds to a cluster via the clusterID field in StorageClass, which is set to the namespace of the cephcluster in rook, so when the CephFilesystem is rebuilt in the same namespace and StorageClass is recreated, in cephfs-csi the variable for the cluster is true, so there is no attempt to recreate the subvolumegroup

psavva commented 3 years ago

Are you able to reproduce this consistently with logs? If so, please can you attach it here, and hopefully the rook team can have a look, and produce a fix accordingly.

On Thu, Mar 4, 2021 at 3:53 PM Muyan0828 notifications@github.com wrote:

I think the root case is that your filesystem was recreate, cephfs-csi stores a bool variable in memory for each cluster to mark whether a subvolumegroup has been created, cephfs-csi corresponds to a cluster via the clusterID field in StorageClass, which is set to the namespace of the cephcluster in rook, so when the CephFilesystem is rebuilt in the same namespace and StorageClass is recreated, in cephfs-csi the variable for the cluster is true, so there is no attempt to recreate the subvolumegroup

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rook/rook/issues/6183#issuecomment-790632992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDFJWBG6UBTL5KNFU52BLTB6GD7ANCNFSM4QOMV7VQ .

Madhu-1 commented 3 years ago

In production, Normally we don't expect the admin to delete and recreate the filesystem with the same name always. keeping the performance of PVC creation in mind we just create a subvolume group once in cephcsi driver per subvolume. if you delete and recreate the filesystem you need to restart the csidriver.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

psavva commented 2 years ago

@Madhu-1 Could I please request that the case and workaround is documented? It would avoid this issue from resurfacing again.

TaylorPzreal commented 2 years ago

I have same error with ceph/ceph:v15.2.13 and calico.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

prazumovsky commented 2 years ago

Please re-open this bug. We're facing the same issue during cephfs recreating. cephfs recreating could be useful during ceph cluster initial deploying and configuration, when something going wrong. The issue will make more headache of this.

In production, Normally we don't expect the admin to delete and recreate the filesystem with the same name always. keeping the performance of PVC creation in mind we just create a subvolume group once in cephcsi driver per subvolume. if you delete and recreate the filesystem you need to restart the csidriver.

Why then rbd csi works another way? If I remove ceph cluster from the cloud - csi rbd provisioner and csi rbd plugin removed also. Then when I'm creating new ceph cluster - there is no such error.

travisn commented 2 years ago

@prazumovsky To summarize, the issue is that the csi driver needs to be restarted if the filesystem is re-created, right? Is the request to document this?

Madhu-1 commented 2 years ago

Please re-open this bug. We're facing the same issue during cephfs recreating. cephfs recreating could be useful during ceph cluster initial deploying and configuration, when something going wrong. The issue will make more headache of this.

IMO If the admin recreates filesystem with the same name, I suggest we just document this one. we don't want to check if the subvolume group always exists for each PVC creation which will impact the PVC creation performance.

In production, Normally we don't expect the admin to delete and recreate the filesystem with the same name always. keeping the performance of PVC creation in mind we just create a subvolume group once in cephcsi driver per subvolume. if you delete and recreate the filesystem you need to restart the csidriver.

Why then rbd csi works another way? If I remove ceph cluster from the cloud - csi rbd provisioner and csi rbd plugin removed also. Then when I'm creating new ceph cluster - there is no such error.

in rbd, we just create the rbd images. For cephcsi, it's a different case we create a subvolumegroup and update the local cache that subvolumegroup is created and no need to retry for recreation again and then we create the subvolumes.

mayank-reynencourt commented 2 years ago

Hi ,

i'm also facing this same issue when i import external ceph (16.2.6) inside rke2(1.22.4) rook version (v1.8.1) , OS Verion Ubuntu 20.04 and also disabled ufw on all rke2 nodes and external ceph nodes.

root@ip-10-0-0-111:/home/ubuntu/rook/deploy/examples/csi/cephfs# kubectl -n rook-ceph logs csi-cephfsplugin-provisioner-874864dcb-rcfvl -c csi-cephfsplugin

E1230 10:46:29.027085       1 utils.go:185] ID: 61 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:33.036776       1 controllerserver.go:172] ID: 62 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:33.036828       1 utils.go:185] ID: 62 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:41.042477       1 controllerserver.go:172] ID: 63 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:41.042531       1 utils.go:185] ID: 63 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:57.052598       1 controllerserver.go:172] ID: 64 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:57.052647       1 utils.go:185] ID: 64 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:27.798255       1 controllerserver.go:172] ID: 66 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:27.798302       1 utils.go:185] ID: 66 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:29.059140       1 controllerserver.go:172] ID: 67 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:29.059188       1 utils.go:185] ID: 67 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:49:37.064327       1 controllerserver.go:172] ID: 70 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:49:37.064373       1 utils.go:185] ID: 70 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:50:04.949580       1 volume.go:163] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a failed to create subvolume group csi, for the vol csi-vol-62fa7f7f-695d-11ec-a830-826f0b6db424: rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:50:04.949702       1 controllerserver.go:100] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a failed to create volume pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a: rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:50:04.956644       1 utils.go:185] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Internal desc = rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:53:20.792641       1 controllerserver.go:172] ID: 74 Req-ID: pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a an operation with the given Volume ID pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a already exists
E1230 10:53:20.792690       1 utils.go:185] ID: 74 Req-ID: pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a already exists

i can provide more details if you want to.

mayank-reynencourt commented 2 years ago

Hi,

Related to above issue my observation was like below:

I can successfully create PVC if i create CephFS filesystem (volume) directly from external ceph cluster using below command

ceph fs volume create <FS_NAME>

but

When i use below filesystem.yaml file to do the same thing but via rook i got same error i described in this [link].(https://github.com/rook/rook/issues/6183#issuecomment-1002985100)

#################################################################################################################
# Create a filesystem with settings with replication enabled for a production environment.
# A minimum of 3 OSDs on different nodes are required in this example.
#  kubectl create -f filesystem.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph # namespace:cluster
spec:
  # The metadata pool spec. Must use replication.
  metadataPool:
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      # Inline compression mode for the data pool
      # Further reference: https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression
      compression_mode:
        none
        # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
      # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
      #target_size_ratio: ".5"
  # The list of data pool specs. Can use replication or erasure coding.
  dataPools:
    - name: replicated
      failureDomain: host
      replicated:
        size: 3
        # Disallow setting pool with replica 1, this could lead to data loss without recovery.
        # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
        requireSafeReplicaSize: true
      parameters:
        # Inline compression mode for the data pool
        # Further reference: https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression
        compression_mode:
          none
          # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
        # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
        #target_size_ratio: ".5"
  # Whether to preserve filesystem after CephFilesystem CRD deletion
  preserveFilesystemOnDelete: true
  # The metadata service (mds) configuration
  metadataServer:
    # The number of active MDS instances
    activeCount: 1
    # Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover.
    # If false, standbys will be available, but will not have a warm cache.
    activeStandby: true
    # The affinity rules to apply to the mds deployment
    placement:
      #  nodeAffinity:
      #    requiredDuringSchedulingIgnoredDuringExecution:
      #      nodeSelectorTerms:
      #      - matchExpressions:
      #        - key: role
      #          operator: In
      #          values:
      #          - mds-node
      #  topologySpreadConstraints:
      #  tolerations:
      #  - key: mds-node
      #    operator: Exists
      #  podAffinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - rook-ceph-mds
            # topologyKey: kubernetes.io/hostname will place MDS across different hosts
            topologyKey: kubernetes.io/hostname
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - rook-ceph-mds
              # topologyKey: */zone can be used to spread MDS across different AZ
              # Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower
              # Use <topologyKey: topology.kubernetes.io/zone>  in k8s cluster is v1.17 or upper
              topologyKey: topology.kubernetes.io/zone
    # A key/value list of annotations
    annotations:
    #  key: value
    # A key/value list of labels
    labels:
    #  key: value
    resources:
    # The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory
    #  limits:
    #    cpu: "500m"
    #    memory: "1024Mi"
    #  requests:
    #    cpu: "500m"
    #    memory: "1024Mi"
    # priorityClassName: my-priority-class
  # Filesystem mirroring settings
  # mirroring:
    # enabled: true
    # list of Kubernetes Secrets containing the peer token
    # for more details see: https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#bootstrap-peers
    # peers:
      #secretNames:
        #- secondary-cluster-peer
    # specify the schedule(s) on which snapshots should be taken
    # see the official syntax here https://docs.ceph.com/en/latest/cephfs/snap-schedule/#add-and-remove-schedules
    # snapshotSchedules:
    #   - path: /
    #     interval: 24h # daily snapshots
    #     startTime: 11:55
    # manage retention policies
    # see syntax duration here https://docs.ceph.com/en/latest/cephfs/snap-schedule/#add-and-remove-retention-policies
    # snapshotRetention:
    #   - path: /
    #     duration: "h 24"

so to solve this i tried to execute command (ceph fs volume create ) inside toolbox job using below toolbox-job.yaml

and my pvc got bounded properly .

is this the right way to solve it ?

also when i see the logs on toolbox-job it shows something like this

# kubectl -n rook-ceph-external logs rook-ceph-toolbox-job--1-sgk5x
Volume created successfully (no MDS daemons created)

wanted to ask that do we need saperate MDS to be assigne with every filesystem we create in ceph ??

mayank-reynencourt commented 2 years ago

IMO If the admin recreates filesystem with the same name, I suggest we just document this one. we don't want to check if the subvolume group always exists for each PVC creation which will impact the PVC creation performance.

Hi @Madhu-1 , what will be to solution for this , do we need to remember that don't create FS with same using CR,

or we can expect a fix on this?

Madhu-1 commented 2 years ago

@mayank-reynencourt if you are creating filesystem using Rook CRD you should expect Rook to create the filesystem. please check rook operator logs and filesystem CR -oyaml for more details.

Yes if the filesystems are not created using Rook CRD the admin is expected to create the filesystem manually before creating the PVC.

mayank-reynencourt commented 2 years ago

@Madhu-1 , thanks for your reply , i'm still facing issue where my rook can create cephfilesystem on external ceph using CR but PVC is in pending state(Error: Connection timed out: "error calling ceph_mount"),

root@ip-10-0-0-111:/home/ubuntu/rook/deploy/examples/csi/cephfs# kubectl -n rook-ceph logs csi-cephfsplugin-provisioner-874864dcb-rcfvl -c csi-cephfsplugin

E1230 10:46:29.027085       1 utils.go:185] ID: 61 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:33.036776       1 controllerserver.go:172] ID: 62 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:33.036828       1 utils.go:185] ID: 62 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:41.042477       1 controllerserver.go:172] ID: 63 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:41.042531       1 utils.go:185] ID: 63 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:57.052598       1 controllerserver.go:172] ID: 64 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:46:57.052647       1 utils.go:185] ID: 64 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:27.798255       1 controllerserver.go:172] ID: 66 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:27.798302       1 utils.go:185] ID: 66 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:29.059140       1 controllerserver.go:172] ID: 67 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:47:29.059188       1 utils.go:185] ID: 67 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:49:37.064327       1 controllerserver.go:172] ID: 70 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:49:37.064373       1 utils.go:185] ID: 70 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a already exists
E1230 10:50:04.949580       1 volume.go:163] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a failed to create subvolume group csi, for the vol csi-vol-62fa7f7f-695d-11ec-a830-826f0b6db424: rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:50:04.949702       1 controllerserver.go:100] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a failed to create volume pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a: rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:50:04.956644       1 utils.go:185] ID: 54 Req-ID: pvc-ca70d814-03de-45c7-b843-a90e4cb13a2a GRPC error: rpc error: code = Internal desc = rados: ret=-110, Connection timed out: "error calling ceph_mount"
E1230 10:53:20.792641       1 controllerserver.go:172] ID: 74 Req-ID: pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a an operation with the given Volume ID pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a already exists
E1230 10:53:20.792690       1 utils.go:185] ID: 74 Req-ID: pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1b28e07a-0b3a-4b84-8398-7f6d3e65070a already exists

little help will be appreciated

Madhu-1 commented 2 years ago

can you paste the cephfileystem CR -oyaml output and also the ceph fs ls output from the toolbox pod?

mayank-reynencourt commented 2 years ago

Hi @Madhu-1 ,

please fine below ceph fs ls output from toolbox container and here is the link for filesystem.yaml i deployed

[rook@rook-ceph-tools-67d7dcc778-4qcrf /]$ ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
name: mayank, metadata pool: cephfs.mayank.meta, data pools: [cephfs.mayank.data ]
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-replicated ]
#  kubectl describe cephfilesystem  -n rook-ceph-external
Name:         myfs
Namespace:    rook-ceph-external
Labels:       <none>
Annotations:  <none>
API Version:  ceph.rook.io/v1
Kind:         CephFilesystem
Metadata:
  Creation Timestamp:  2022-01-04T12:55:57Z
  Finalizers:
    cephfilesystem.ceph.rook.io
  Generation:  2
  Managed Fields:
    API Version:  ceph.rook.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:metadataPool:
          .:
          f:parameters:
            .:
            f:compression_mode:
          f:replicated:
            .:
            f:requireSafeReplicaSize:
            f:size:
        f:metadataServer:
          .:
          f:activeCount:
          f:activeStandby:
          f:placement:
            .:
            f:podAntiAffinity:
              .:
              f:preferredDuringSchedulingIgnoredDuringExecution:
              f:requiredDuringSchedulingIgnoredDuringExecution:
        f:preserveFilesystemOnDelete:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-01-04T12:55:57Z
    API Version:  ceph.rook.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"cephfilesystem.ceph.rook.io":
      f:spec:
        f:dataPools:
        f:metadataPool:
          f:erasureCoded:
            .:
            f:codingChunks:
            f:dataChunks:
          f:mirroring:
          f:quotas:
          f:statusCheck:
            .:
            f:mirror:
        f:metadataServer:
          f:resources:
        f:statusCheck:
          .:
          f:mirror:
    Manager:      rook
    Operation:    Update
    Time:         2022-01-04T12:55:57Z
    API Version:  ceph.rook.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:phase:
    Manager:         rook
    Operation:       Update
    Subresource:     status
    Time:            2022-01-04T12:56:16Z
  Resource Version:  30454
  UID:               6f805b0d-d924-496a-9355-17ea9fc761be
Spec:
  Data Pools:
    Erasure Coded:
      Coding Chunks:  0
      Data Chunks:    0
    Failure Domain:   host
    Mirroring:
    Name:  replicated
    Parameters:
      compression_mode:  none
    Quotas:
    Replicated:
      Require Safe Replica Size:  true
      Size:                       3
    Status Check:
      Mirror:
  Metadata Pool:
    Erasure Coded:
      Coding Chunks:  0
      Data Chunks:    0
    Mirroring:
    Parameters:
      compression_mode:  none
    Quotas:
    Replicated:
      Require Safe Replica Size:  true
      Size:                       3
    Status Check:
      Mirror:
  Metadata Server:
    Active Count:    1
    Active Standby:  true
    Placement:
      Pod Anti Affinity:
        Preferred During Scheduling Ignored During Execution:
          Pod Affinity Term:
            Label Selector:
              Match Expressions:
                Key:       app
                Operator:  In
                Values:
                  rook-ceph-mds
            Topology Key:  topology.kubernetes.io/zone
          Weight:          100
        Required During Scheduling Ignored During Execution:
          Label Selector:
            Match Expressions:
              Key:       app
              Operator:  In
              Values:
                rook-ceph-mds
          Topology Key:  kubernetes.io/hostname
    Resources:
  Preserve Filesystem On Delete:  true
  Status Check:
    Mirror:
Status:
  Phase:  Ready
Events:   <none>
#   kubectl get cephfilesystem  myfs -n rook-ceph-external -o yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ceph.rook.io/v1","kind":"CephFilesystem","metadata":{"annotations":{},"name":"myfs","namespace":"rook-ceph-external"},"spec":{"dataPools":[{"failureDomain":"host","name":"replicated","parameters":{"compression_mode":"none"},"replicated":{"requireSafeReplicaSize":true,"size":3}}],"metadataPool":{"parameters":{"compression_mode":"none"},"replicated":{"requireSafeReplicaSize":true,"size":3}},"metadataServer":{"activeCount":1,"activeStandby":true,"annotations":null,"labels":null,"placement":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["rook-ceph-mds"]}]},"topologyKey":"topology.kubernetes.io/zone"},"weight":100}],"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["rook-ceph-mds"]}]},"topologyKey":"kubernetes.io/hostname"}]}},"resources":null},"preserveFilesystemOnDelete":true}}
  creationTimestamp: "2022-01-04T12:55:57Z"
  finalizers:
  - cephfilesystem.ceph.rook.io
  generation: 2
  name: myfs
  namespace: rook-ceph-external
  resourceVersion: "30454"
  uid: 6f805b0d-d924-496a-9355-17ea9fc761be
spec:
  dataPools:
  - erasureCoded:
      codingChunks: 0
      dataChunks: 0
    failureDomain: host
    mirroring: {}
    name: replicated
    parameters:
      compression_mode: none
    quotas: {}
    replicated:
      requireSafeReplicaSize: true
      size: 3
    statusCheck:
      mirror: {}
  metadataPool:
    erasureCoded:
      codingChunks: 0
      dataChunks: 0
    mirroring: {}
    parameters:
      compression_mode: none
    quotas: {}
    replicated:
      requireSafeReplicaSize: true
      size: 3
    statusCheck:
      mirror: {}
  metadataServer:
    activeCount: 1
    activeStandby: true
    placement:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - rook-ceph-mds
            topologyKey: topology.kubernetes.io/zone
          weight: 100
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - rook-ceph-mds
          topologyKey: kubernetes.io/hostname
    resources: {}
  preserveFilesystemOnDelete: true
  statusCheck:
    mirror: {}
status:
  phase: Ready
mayank-reynencourt commented 2 years ago

@Madhu-1 , also below command hangs on external ceph cluster as well as from toolbox

ceph fs subvolumegroup ls myfs

i tried with CNI:calico and with default CNI: canal, in both cases its stuck

i created 2 filesystem 1). mayank (using ceph cli # ceph fs create mayank) 2) myfs (using CR)

below is the output regarding both filesystem subvolmes from ceph cluster


[rook@rook-ceph-tools-67d7dcc778-4qcrf /]$ ceph fs subvolumegroup ls cephfs
[]

# filesystem created via toolbox-job using ceph cli
[root@ip-10-0-0-224 /]# ceph fs subvolumegroup ls mayank 
[
    {
        "name": "_deleting"
    },
    {
        "name": "csi"
    }
]

[root@ip-10-0-0-114 /]# ceph fs subvolumegroup ls myfs
Error ETIMEDOUT: error calling ceph_mount

maybe that will help

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Zvezdoreel commented 1 month ago

So what should i do if i encounter this? Delete the file system and create a new one with new name?

Madhu-1 commented 1 month ago

So what should i do if i encounter this? Delete the file system and create a new one with new name?

You need to create the subvolumegroup after creating the filesystem

@Zvezdoreel This bug is no longer valid with the latest Rook release as this is fixed.