kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
416 stars 260 forks source link

Datavolume is failed occasionally when clone a datavolume from a pvc for concurrency #3259

Closed mengyu1987 closed 4 months ago

mengyu1987 commented 5 months ago

What happened: A clear and concise description of what the bug is.

  1. clone a dv from a pvc of another namespace

clone

  1. clone 20 dvs for concurrency clone-batch

  2. 2 dvs are failed failed

What you expected to happen: A clear and concise description of what you expected to happen.

all dvs are success.

How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior.

Additional context: Add any other context about the problem here.

Environment:

awels commented 4 months ago

@mhenriks Any thoughts on this? Looks like something is going wrong with the namespace transfer since we are seeing the lost claim?

mhenriks commented 4 months ago

@mengyu1987 what version of CDI is this?

EDIT I see 1.58

mhenriks commented 4 months ago

@awels lost claims are expected but should be deleted

awels commented 4 months ago

The description says it is CDI 1.58.1

mengyu1987 commented 4 months ago

I have the same issue using version 1.59.0 of CDI. Please note that when the source PVC and cloned DV are in the same namespace, the issue does not occur.

mhenriks commented 4 months ago

@mengyu1987 I was unable to reproduce with kubevirtci with: KUBEVIRT_PROVIDER=k8s-1.28 KUBEVIRT_STORAGE=rook-ceph-default KUBEVIRT_NUM_NODES=2

Can you post the following: Lost PVC/DV yamls Events on lost PVC/DV StorageClass yaml

Any relevant errors in cdi-deployment log?

mengyu1987 commented 4 months ago

@mhenriks Thanks for your reply. I just can reproduce when the count of cloned dvs is more than 20 for concurrency. My k8s cluster with:
KUBEVIRT_PROVIDER=k8s-1.27.6 KUBEVIRT_STORAGE=rook-ceph.rbd.csi KUBEVIRT_NUM_NODES=3

the yaml follows:

failed DV yaml:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
annotations:
cdi.kubevirt.io/cloneType: csi-clone
cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MTYxODQ5OTQsImlhdCI6MTcxNjE4NDY5NCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctMzlmNXkydG0iLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzE2MTg0Njk0LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiYXppbXYiLCJ0YXJnZXROYW1lc3BhY2UiOiJ2bS00ZmMwYWU3MyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.N1rHKXOtPyY9qjk6kpawAUihTbBFnDNqUBvYucw5R7RNaWDh-mwbCX9AUgKW84l5-36avAUEesawGAHnU-XyAqT-p69OCxlsOlRMfDN8wHNJe3T05kiaDH2R-T_shTPRWpzypoUO4L9OsoRhrQuZ2sW9PruyOavhhRLTgyixIw7GFzbeiZzAPi_rudMZAEVv0KOK3v1A951v6n13MJkHkANh5spO7qYJIAPeg3JLzW8m7bBviLNGBR9wkMepN1VuIxpobJxty0aUKluCgTqRAHJccY9q9-m0yvPY0jLG2JT3A1foiuwdiT8MJBycxfep3zwwvfsiud9HaSuQWDDrUA
cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwMzE1NDQ2OTUsImlhdCI6MTcxNjE4NDY5NSwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLTM5ZjV5MnRtIiwibmFtZXNwYWNlIjoiZGVmYXVsdCIsIm5iZiI6MTcxNjE4NDY5NSwib3BlcmF0aW9uIjoiQ2xvbmUiLCJwYXJhbXMiOnsidGFyZ2V0TmFtZSI6ImF6aW12IiwidGFyZ2V0TmFtZXNwYWNlIjoidm0tNGZjMGFlNzMiLCJ1aWQiOiJhNWE1NzJiNS1lNWQ1LTQ4MDktYTQzYy0wYzZlODQ0MTE4YjkifSwicmVzb3VyY2UiOnsiZ3JvdXAiOiIiLCJyZXNvdXJjZSI6InBlcnNpc3RlbnR2b2x1bWVjbGFpbXMiLCJ2ZXJzaW9uIjoidjEifX0.g5NNJ8K9oS1EJAWDQ2Kj4zeQxWIwPluI7pv_G5vDRbDS6LsoUQDoBKGS_VlqHi6kw96GQnov5wgemRWN6onUJWuKrCmXSVhz9Mk9hf9Y-AwEwfvNBcyKZ5OPiVHbKRhA2fslIwW4G_ZT9xCpzDiTjU5LZ3vi-MFhWMKP230xDij7SkJCA7wp6sQT9tM1oewsbPS9Tmf6wX86a_QzGVvldSu7qg_5o4LN0QVV68veoD1xTWDJO0gtwcOBkyL6y1pNR26KVDIGi2p3R7Q8mByoQBpASyfqxnyq12RxSADqm_iRmwBI5Y-EY3Y9UMfFxc40-kh1gjRpKsGmytphoc2BZg
cdi.kubevirt.io/storage.usePopulator: "true"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"azimv","namespace":"vm-4fc0ae73"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-39f5y2tm","namespace":"default"}}}}
creationTimestamp: "2024-05-20T05:58:14Z"
generation: 1
name: azimv
namespace: vm-4fc0ae73
resourceVersion: "4788828"
uid: a5a572b5-e5d5-4809-a43c-0c6e844118b9
spec:
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: ceph-block
volumeMode: Block
source:
pvc:
name: img-39f5y2tm
namespace: default
status:
claimName: azimv
conditions:
- lastHeartbeatTime: "2024-05-20T05:58:25Z"
lastTransitionTime: "2024-05-20T05:58:25Z"
message: Claim Lost
reason: ClaimLost
status: "False"
type: Bound
- lastHeartbeatTime: "2024-05-20T05:58:25Z"
lastTransitionTime: "2024-05-20T05:58:25Z"
status: "False"
type: Ready
- lastHeartbeatTime: "2024-05-20T05:58:20Z"
lastTransitionTime: "2024-05-20T05:58:20Z"
message: Clone Complete
reason: Completed
status: "False"
type: Running
phase: Failed
progress: 100.0%

Lost PVC yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/clonePhase: Succeeded
cdi.kubevirt.io/cloneType: csi-clone
cdi.kubevirt.io/dataSourceNamespace: default
cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MTYxODQ5OTQsImlhdCI6MTcxNjE4NDY5NCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctMzlmNXkydG0iLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzE2MTg0Njk0LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiYXppbXYiLCJ0YXJnZXROYW1lc3BhY2UiOiJ2bS00ZmMwYWU3MyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.N1rHKXOtPyY9qjk6kpawAUihTbBFnDNqUBvYucw5R7RNaWDh-mwbCX9AUgKW84l5-36avAUEesawGAHnU-XyAqT-p69OCxlsOlRMfDN8wHNJe3T05kiaDH2R-T_shTPRWpzypoUO4L9OsoRhrQuZ2sW9PruyOavhhRLTgyixIw7GFzbeiZzAPi_rudMZAEVv0KOK3v1A951v6n13MJkHkANh5spO7qYJIAPeg3JLzW8m7bBviLNGBR9wkMepN1VuIxpobJxty0aUKluCgTqRAHJccY9q9-m0yvPY0jLG2JT3A1foiuwdiT8MJBycxfep3zwwvfsiud9HaSuQWDDrUA
cdi.kubevirt.io/storage.condition.running: "false"
cdi.kubevirt.io/storage.condition.running.message: Clone Complete
cdi.kubevirt.io/storage.condition.running.reason: Completed
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwMzE1NDQ2OTYsImlhdCI6MTcxNjE4NDY5NiwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLTM5ZjV5MnRtIiwibmFtZXNwYWNlIjoiZGVmYXVsdCIsIm5iZiI6MTcxNjE4NDY5Niwib3BlcmF0aW9uIjoiQ2xvbmUiLCJwYXJhbXMiOnsidGFyZ2V0TmFtZSI6ImF6aW12IiwidGFyZ2V0TmFtZXNwYWNlIjoidm0tNGZjMGFlNzMiLCJ1aWQiOiI2MjA5ZjRmNS03MzU0LTRmM2EtYjlmNC1iZjQ5NzY2M2FmMDYifSwicmVzb3VyY2UiOnsiZ3JvdXAiOiIiLCJyZXNvdXJjZSI6InBlcnNpc3RlbnR2b2x1bWVjbGFpbXMiLCJ2ZXJzaW9uIjoidjEifX0.bJv06O4R8N2pnYBWIUrU2ViKN5KxguIENjcsTyVILEZXQxdJwzecl5H6LfDehWvQpdMXI2LkXi-QhRpAJhPCYlD3nbnzjIsziThybi8qQZn20VZKaQPWiriVg1y3tqGAe42jvPGhksdtTjEMIqpiJYFGz9w4TFcblWH7-0MDuwr6TQIDVYx7pP86LuKNCqO4EscPI9nQ1ILN5HPUDX0-HuTQDQ0Frp4jHuXNTkR3BLtP1A_fObRAUFffOfZ-6qlAGfZ9GbmlULfLN98lZ-WNa3HTV3OgolgI32ohFDDk1MVl9hHDtULIcgNSMgwDKxsAT0jtP16HuvII7_jyBam7DQ
cdi.kubevirt.io/storage.pod.restarts: "0"
cdi.kubevirt.io/storage.populator.progress: 100.0%
cdi.kubevirt.io/storage.preallocation.requested: "false"
cdi.kubevirt.io/storage.usePopulator: "true"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"azimv","namespace":"vm-4fc0ae73"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-39f5y2tm","namespace":"default"}}}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
creationTimestamp: "2024-05-20T05:58:16Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
name: azimv
namespace: vm-4fc0ae73
ownerReferences:
- apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: DataVolume
name: azimv
uid: a5a572b5-e5d5-4809-a43c-0c6e844118b9
resourceVersion: "4788825"
uid: 6209f4f5-7354-4f3a-b9f4-bf497663af06
spec:
accessModes:
- ReadWriteOnce
dataSource:
apiGroup: cdi.kubevirt.io
kind: VolumeCloneSource
name: volume-clone-source-a5a572b5-e5d5-4809-a43c-0c6e844118b9
dataSourceRef:
apiGroup: cdi.kubevirt.io
kind: VolumeCloneSource
name: volume-clone-source-a5a572b5-e5d5-4809-a43c-0c6e844118b9
resources:
requests:
storage: 20Gi
storageClassName: ceph-block
volumeMode: Block
volumeName: pvc-48e24d59-725f-41fc-8cca-f406710ff7bf
status:
phase: Lost

Events on lost DV:

Events:
Type     Reason              Age   From                             Message
----     ------              ----  ----                             -------
Normal   CloneScheduled      23m   datavolume-pvc-clone-controller  Cloning from default/img-39f5y2tm into vm-4fc0ae73/azimv scheduled
Normal   Pending             23m   datavolume-pvc-clone-controller  PVC azimv Pending
Warning  Pending             23m   datavolume-pvc-clone-controller  Clone Pending
Normal   CSICloneInProgress  23m   datavolume-pvc-clone-controller  CSI Volume clone in progress (for pvc default/img-39f5y2tm)
Warning  Error               22m   datavolume-pvc-clone-controller  actual PVC size missing
Normal   Bound               22m   datavolume-pvc-clone-controller  PVC azimv Bound
Normal   CloneSucceeded      22m   datavolume-pvc-clone-controller  Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv
Warning  ErrClaimLost        22m   datavolume-pvc-clone-controller  PVC azimv lost
Normal   ClaimLost           22m   datavolume-pvc-clone-controller  Claim Lost

Events on lost PVC:

Events:
Type     Reason                Age                From                                                                                                        Message
----     ------                ----               ----                                                                                                        -------
Normal   ExternalProvisioning  22m (x7 over 22m)  persistentvolume-controller                                                                                 waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
Normal   Provisioning          22m (x6 over 22m)  rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-76f966fdd8-gh558_ae0d33a0-c292-4c40-bf36-49e2fd466e8b  External provisioner is provisioning volume for claim "vm-4fc0ae73/azimv"
Normal   Provisioning          22m (x6 over 22m)  external-provisioner                                                                                        Assuming an external populator will provision the volume
Warning  ClaimMisbound         22m                persistentvolume-controller                                                                                 Two claims are bound to the same volume, this one is bound incorrectly

StorageClass yaml:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: os-rook-ceph-cluster
meta.helm.sh/release-namespace: default
storageclass.kubernetes.io/is-default-class: "false"
creationTimestamp: "2024-05-15T12:01:27Z"
labels:
app.kubernetes.io/managed-by: Helm
name: ceph-block
resourceVersion: "2289"
uid: dd454972-29e1-4932-bbc1-789ef28e5c6e
parameters:
clusterID: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
imageFeatures: layering
imageFormat: "2"
pool: ceph-blockpool
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

PV yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
creationTimestamp: "2024-05-20T05:58:20Z"
finalizers:
- kubernetes.io/pv-protection
name: pvc-48e24d59-725f-41fc-8cca-f406710ff7bf
resourceVersion: "4788637"
uid: b576d6e4-4a1e-43fe-895c-60f0aacecc4e
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06
namespace: default
resourceVersion: "4788443"
uid: 48e24d59-725f-41fc-8cca-f406710ff7bf
csi:
controllerExpandSecretRef:
name: rook-csi-rbd-provisioner
namespace: rook-ceph
driver: rook-ceph.rbd.csi.ceph.com
nodeStageSecretRef:
name: rook-csi-rbd-node
namespace: rook-ceph
volumeAttributes:
clusterID: rook-ceph
imageFeatures: layering
imageFormat: "2"
imageName: csi-vol-a68f3967-ec00-42ae-91d4-161c9ed87dae
journalPool: ceph-blockpool
pool: ceph-blockpool
storage.kubernetes.io/csiProvisionerIdentity: 1715856786508-2790-rook-ceph.rbd.csi.ceph.com
volumeHandle: 0001-0009-rook-ceph-0000000000000001-a68f3967-ec00-42ae-91d4-161c9ed87dae
persistentVolumeReclaimPolicy: Retain
storageClassName: ceph-block
volumeMode: Block
status:
phase: Released

some relevant infos in cdi-deployment log:


{"level":"error","ts":"2024-05-20T05:58:16Z","logger":"controller.datavolume-pvc-clone-controller","msg":"unable to update datavolume status","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).emitEvent\n\tpkg/controller/datavolume/controller-base.go:1012\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateStatus\n\tpkg/controller/datavolume/controller-base.go:909\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:434\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"b4f03528-12c1-42dc-b99c-9082ccfc80e5","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","logger":"controller.datavolume-pvc-clone-controller","msg":"unable to update datavolume status","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).emitEvent\n\tpkg/controller/datavolume/controller-base.go:1012\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateDataVolumeStatusPhaseWithEvent\n\tpkg/controller/datavolume/controller-base.go:828\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateDataVolumeStatusPhaseSync\n\tpkg/controller/datavolume/controller-base.go:804\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateStatus\n\tpkg/controller/datavolume/controller-base.go:846\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:434\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"4ca351d4-54ca-430e-bd84-a0268dad08e7","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"clone-populator","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"945ab067-aa17-4596-a8d5-2f5a782adeff","error":"Operation cannot be fulfilled on persistentvolumeclaims \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":false,"has finalizer?":false}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":true,"has finalizer?":false}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.upload-controller","msg":"not doing anything with PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","isUpload":false,"isCloneTarget":false,"isBound":true,"podSucceededFromPVC":false,"deletionTimeStamp set?":true}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-populator","msg":"removing finalizer","PVC":"vm-4fc0ae73/azimv"} {"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788596"},"reason":"CloneSucceeded"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.import-controller","msg":"PVC not bound, skipping pvc","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","Phase":"Lost"}

{"level":"error","ts":"2024-05-20T05:58:20Z","msg":"Reconciler error","controller":"clone-populator","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"830576eb-a951-4fe6-a647-cd9ca8b8daf4","error":"actual PVC size missing","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"actual PVC size missing","type":"Warning","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788588"},"reason":"Error"} {"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"vm-4fc0ae73/azimv","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":true,"has finalizer?":false} {"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.datavolume-pvc-clone-controller","msg":"Unknown populator phase","phase":"Error"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788596"},"reason":"CloneSucceeded"}

{"level":"error","ts":"2024-05-20T05:58:21Z","logger":"controller.datavolume-pvc-clone-controller","msg":"Unable to sync update dv meta","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(ReconcilerBase).syncUpdate\n\tpkg/controller/datavolume/controller-base.go:540\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(ReconcilerBase).syncCommon\n\tpkg/controller/datavolume/controller-base.go:449\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(PvcCloneReconciler).syncClone\n\tpkg/controller/datavolume/pvc-clone-controller.go:247\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(PvcCloneReconciler).sync\n\tpkg/controller/datavolume/pvc-clone-controller.go:239\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:433\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"} {"level":"error","ts":"2024-05-20T05:58:21Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"ea1b450d-b15f-4e83-8223-92a4ad1a04a0","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"controller.clone-controller","msg":"PVC annotation not found, skipping pvc","PVC":"vm-4fc0ae73/azimv","annotation":"k8s.io/CloneRequest"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"controller.clone-controller","msg":"PVC not bound, skipping pvc","PVC":"vm-4fc0ae73/azimv","Phase":"Lost"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"events","msg":"PVC azimv lost","type":"Warning","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788828"},"reason":"ErrClaimLost"} {"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"events","msg":"Claim Lost","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788828"},"reason":"ClaimLost"}


> And my test shell:

! /bin/bash

count=0 while [ $count -lt 20 ] do count=expr $count + 1 length=5 random_string=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w $length | head -n 1) echo $random_string

sed "s/DV-NAME/$random_string/g" /root/my/batch-dv.yaml > /root/my/batch-dv-1.yaml

kubectl apply -f /root/my/batch-dv-1.yaml

done


> my test dv yaml:

apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: DV-NAME namespace: vm-4fc0ae73 spec: pvc: accessModes:

mhenriks commented 4 months ago

@mengyu1987 looks like you are not using the ceph provisioner included in kubevirtci

I could not reproduce with the following params

KUBEVIRT_PROVIDER=k8s-1.27.6
KUBEVIRT_STORAGE=rook-ceph-default
KUBEVIRT_NUM_NODES=3

KUBEVIRT_STORAGE=rook-ceph-default will install rook/ceph provisioner

I noticed your StorageClass has Retain rather than Delete so I tried testing with that setting and still could not reproduce with rook-ceph-default.

I suggest you run with rook-ceph-default. It that works then there may be an issue with your provisioner config

mengyu1987 commented 4 months ago

@mengyu1987 looks like you are not using the ceph provisioner included in kubevirtci

I could not reproduce with the following params

KUBEVIRT_PROVIDER=k8s-1.27.6
KUBEVIRT_STORAGE=rook-ceph-default
KUBEVIRT_NUM_NODES=3

KUBEVIRT_STORAGE=rook-ceph-default will install rook/ceph provisioner

I noticed your StorageClass has Retain rather than Delete so I tried testing with that setting and still could not reproduce with rook-ceph-default.

I suggest you run with rook-ceph-default. It that works then there may be an issue with your provisioner config

thanks @mhenriks. I run with rook-ceph-default,It does not reproduce. It is resolved.

mengyu1987 commented 4 months ago

@mhenriks It is reproduced once more with rook-ceph-default. There are two StorageClasses:

[root@node152 ~]# kubectl get sc
NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Retain          Immediate           true                   7d15h
ceph-fs                rook-ceph.cephfs.csi.ceph.com   Retain          Immediate           true                   7d15h

And ceph-block-default yaml:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: os-rook-ceph-cluster
    meta.helm.sh/release-namespace: default
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2024-05-15T12:01:27Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: ceph-block
  resourceVersion: "8172170"
  uid: dd454972-29e1-4932-bbc1-789ef28e5c6e
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  imageFeatures: layering
  imageFormat: "2"
  pool: ceph-blockpool
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

I dont know why the dv is reconciled to failed after it is succeeded. succeeded-failed The dv is succeeded in cdi-deployment log: succeeded-log Then reconsile, the dv is failed. reconsile-log

mhenriks commented 4 months ago

@mengyu1987 this setting is not compatible with rook-ceph-default

pool: ceph-blockpool

✗ k get  cephblockpools -n rook-ceph
NAME          PHASE
replicapool   Ready

there is no blockpool named ceph-blockpool

aglitke commented 4 months ago

This seems related to your custom implementation of rook-ceph and not an issue with CDI. Therefore I am going to close the issue.