rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.04k stars 2.66k forks source link

CSI volume cloning does not work between cephfs instances #13828

Open uhthomas opened 3 months ago

uhthomas commented 3 months ago

Is this a bug report or feature request?

Deviation from expected behavior:

CSI volume cloning does not work between cephfs instances. I have a PVC on one cephfs instance, and want to create a new one on another. I am using the dataSource field to do this.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: some-name
spec:
  accessModes:
  - ReadWriteMany
  dataSource:
    kind: PersistentVolumeClaim
    name: some-other-name
  resources:
    requests:
      storage: 4Ti
  storageClassName: rook-cephfs-nvme
  volumeMode: Filesystem
status:
  phase: Pending

It just won't provision.

Events:
  Type     Reason                Age                  From                                                                                                              Message
  ----     ------                ----                 ----                                                                                                              -------
  Normal   Provisioning          2m25s (x11 over 7m)  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-864d998d45-cn4bp_8ad0e755-888a-445e-b4d3-e948de7c37c6  External provisioner is provisioning volume for claim "media/media-nvme"
  Warning  ProvisioningFailed    2m25s (x11 over 7m)  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-864d998d45-cn4bp_8ad0e755-888a-445e-b4d3-e948de7c37c6  failed to provision volume with StorageClass "rook-cephfs-nvme": rpc error: code = Unknown desc = rados: ret=-22, Invalid argument: "invalid pool layout 'main-nvme-ec'--need a valid data pool"
  Normal   ExternalProvisioning  73s (x26 over 7m)    persistentvolume-controller                                                                                       Waiting for a volume to be created either by the external provisioner 'rook-ceph.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

There is nothing wrong with the cephfs or storageclass, as the PVC is provisioned just fine if I remove the dataSource field.

Expected behavior:

Rook/ceph should support volume cloning to/from any storage class, inline with the documentation.

How to reproduce it (minimal and precise):

See above.

File(s) to submit:

Logs to submit:

Cluster Status to submit:

Environment:

Madhu-1 commented 3 months ago

Invalid argument: "invalid pool layout 'main-nvme-ec'--need a valid data pool

This says you its not a valid data pool, can you please provide storageclasses yaml and reproducer steps?

uhthomas commented 3 months ago

Invalid argument: "invalid pool layout 'main-nvme-ec'--need a valid data pool

This says you its not a valid data pool, can you please provide storageclasses yaml and reproducer steps?

As mentioned in the issue, the storage class and ceph file system objects are valid and work fine. I can create PVCs with the storage class no problem. This error only happens when I add the dataDource field.

Madhu-1 commented 3 months ago

Invalid argument: "invalid pool layout 'main-nvme-ec'--need a valid data pool

This says you its not a valid data pool, can you please provide storageclasses yaml and reproducer steps?

As mentioned in the issue, the storage class and ceph file system objects are valid and work fine. I can create PVCs with the storage class no problem. This error only happens when I add the dataDource field.

Are you trying to clone to the same storageclass or the PVC is created with one storageclass and you are trying to clone to a new storageclass?

I would be great if you provide these details and sc yaml's (if they are different)

uhthomas commented 3 months ago

Invalid argument: "invalid pool layout 'main-nvme-ec'--need a valid data pool

This says you its not a valid data pool, can you please provide storageclasses yaml and reproducer steps?

As mentioned in the issue, the storage class and ceph file system objects are valid and work fine. I can create PVCs with the storage class no problem. This error only happens when I add the dataDource field.

Are you trying to clone to the same storageclass or the PVC is created with one storageclass and you are trying to clone to a new storageclass?

I would be great if you provide these details and sc yaml's (if they are different)

It's a different ceph file system storage class. There is not really much special about them, maybe other than that they are erasure coded?

uhthomas commented 3 months ago

The original storage class and ceph file system is here:

https://github.com/uhthomas/automata/blob/main/k8s/amour/rook_ceph/storage_class_list.cue

https://github.com/uhthomas/automata/blob/main/k8s/amour/rook_ceph/ceph_filesystem_list.cue

The new one is basically identical except with nvme as the device class.

Madhu-1 commented 3 months ago

AFAIK ceph fs doesn't support cloning across filesystems,cloning is supporting when source and destination are in same filesystem is it something supported?

uhthomas commented 3 months ago

AFAIK ceph fs doesn't support cloning across filesystems,cloning is supporting when source and destination are in same filesystem is it something supported?

It works with RBD, and the Kubernetes docs say this should be fine.

image

Madhu-1 commented 3 months ago

it might be supported in kubernetes across storageclass but the storage parameters might be different in storageclasses and this needs to be supported by ceph as well. https://docs.ceph.com/en/quincy/cephfs/fs-volumes/ doesn't take about clone across filesystem, if you are cloning the same filesystem but within a different pool please provide storageclass yaml output so that i can try and see if whatever you are trying is valid from cephcsi point or not.

uhthomas commented 3 months ago

I am using different ceph file systems.

It's a real shame it may be unsupported? Could we understand why, or if it's possible to support? The error is very misleading and the behaviour is unexpected given this works fine for RBD.

Madhu-1 commented 3 months ago

I am using different ceph file systems.

It's a real shame it may be unsupported? Could we understand why, or if it's possible to support? The error is very misleading and the behaviour is unexpected given this works fine for RBD.

The support needs to be added to ceph first if possible, you can open ceph tracker for any request with ceph, please open an issue with cephcsi, we can work on improving the error message.

RBD supports clones across the pool which is why it is supported with cephcsi as well, whatever ceph supports can be supported with cephcsi.

github-actions[bot] commented 4 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.