kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
400 stars 254 forks source link

CDI importer pod shows nbdkit error , not able to create data-volume #3262

Open lingesh00 opened 3 months ago

lingesh00 commented 3 months ago

Scenerio:

While creating data-volume with the following spec , the importer pod restart continuously and shows "Unable to process data: Unable to convert source data to target format: qemu-img: Could not open ''nbd+unix:///?socket=/var/run/nbdkit.sock'': Could not initialize refcount handling: Input/output error "

Steps to Reproduce the Problem

1) DV Spec:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: test-dv10
  namespace: vm-registry
spec:
  pvc:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 80Gi
    storageClassName: rook-ceph-block
    volumeMode: Block
  source:
    http:
      url: http://minio-service.kubevm.svc.cluster.local:9000/images/image_1FDhkrqe.img

2)CDI yaml

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  annotations:
    cdi.kubevirt.io/configAuthority: ""
    meta.helm.sh/release-name: edgepass01
    meta.helm.sh/release-namespace: kubevm
  creationTimestamp: "2024-04-04T13:04:01Z"
  finalizers:
  - operator.cdi.kubevirt.io
  generation: 8
  labels:
    app.kubernetes.io/instance: edgepass01
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cdi
    app.kubernetes.io/version: 0.1.0
    helm.sh/chart: cdi-0.1.0
  name: edgepass01-cdi-cdi
  resourceVersion: "451808347"
  selfLink: /apis/cdi.kubevirt.io/v1beta1/cdis/edgepass01-cdi-cdi
  uid: e2adfafb-d217-42e4-b00c-8be924159189
spec:
  config:
    podResourceRequirements:
      limits:
        cpu: 750m
        memory: 6G
      requests:
        cpu: 100m
        memory: 60M
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
  workload:
    nodeSelector:
      kubernetes.io/os: linux

Additional Info:

Logs of the importer pods

Screenshot 2024-05-13 120900

Screenshot 2024-05-14 120455

K8s,Kubevirt and CDI version

Screenshot 2024-05-15 125556

Environment:

awels commented 3 months ago

Does minio support http byte ranges? If not then nbdkit will fail. Can you try the following experiment? Can you gzip the image, and put it in the bucket, this will bypass nbdkit and attempt an import using scratch space that doesn't use byte ranges.

lingesh00 commented 3 months ago

Hi @awels thanks, I'm able to create dv with gzip image , even with the gzip the creation of dv is taking too long .But i want to create dv with iso format aswell . May i know how to fix this issue?

Minio yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: edgepass01
    meta.helm.sh/release-namespace: kubevm
  creationTimestamp: "2024-04-04T13:04:01Z"
  generation: 10
  labels:
    app: minio
    app.kubernetes.io/instance: edgepass01
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: minio
    app.kubernetes.io/version: 0.1.0
    helm.sh/chart: minio-0.1.0
  name: minio
  namespace: kubevm
  resourceVersion: "480072773"
  selfLink: /apis/apps/v1/namespaces/kubevm/statefulsets/minio
  uid: 1da418c1-2d3b-4f63-8d87-6d8e3e728f53
spec:
  podManagementPolicy: OrderedReady
  replicas: 4
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: minio
  serviceName: minio
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: minio
    spec:
      containers:
      - args:
        - server
        - http://minio-{0...3}.minio.kubevm.svc.cluster.local/data1
        - http://minio-{0...3}.minio.kubevm.svc.cluster.local/data2
        - http://minio-{0...3}.minio.kubevm.svc.cluster.local/data3
        - http://minio-{0...3}.minio.kubevm.svc.cluster.local/data4
        - --console-address
        - :9001
        env:
        - name: MINIO_ROOT_USER
          valueFrom:
            secretKeyRef:
              key: MINIO_ROOT_USER
              name: edgepass01-minio-secret
        - name: MINIO_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              key: MINIO_ROOT_PASSWORD
              name: edgepass01-minio-secret
        image: minio/minio@sha256:b05b6e8f65d818137f561bce9bb25edf701df5446e504f2321da7e8be24a034c
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /minio/health/live
            port: http
            scheme: HTTP
          initialDelaySeconds: 120
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 10
        name: minio
        ports:
        - containerPort: 9000
          name: http
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data1
          name: data1
        - mountPath: /data2
          name: data2
        - mountPath: /data3
          name: data3
        - mountPath: /data4
          name: data4
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data1
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 80Gi
      storageClassName: rook-ceph-block
      volumeMode: Filesystem
    status:
      phase: Pending
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data2
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 80Gi
      storageClassName: rook-ceph-block
      volumeMode: Filesystem
    status:
      phase: Pending
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data3
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 80Gi
      storageClassName: rook-ceph-block
      volumeMode: Filesystem
    status:
      phase: Pending
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data4
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 80Gi
      storageClassName: rook-ceph-block
      volumeMode: Filesystem
    status:
      phase: Pending
awels commented 3 months ago

So all KubeVirt disks stored in PVCs are RAW disks. RAW disks = ISO. I am not sure I entirely understand your question. You want to mount an ISO disk as a cdrom in a VM? The yaml here is the minio server pod and it seems to be pending. describing it will probably tell you why it is pending.

lingesh00 commented 2 months ago

Hi @awels , yes i'm mounting as ISO DISK as a CDROM in a VM. The Minio issue(nbdkit issue) was fixed by updating the latest minio version . But still i'm facing slowness in creating dv.

awels commented 2 months ago

When you say slowness, you mean when importing the image into the PVC? How long does it take? We had an issue in older versions of CDI where certain storage would take a really long time. Like 10 minutes when doing a curl to download the image was 15 seconds or so. This had to do with the sync mode we used when writing to the storage. But slowness can be caused by a lot of things.

To eliminate (or implicate) CDI, can you start a pod in your cluster that has curl, then exec into that pod and curl the image file and see how long that takes?

lingesh00 commented 2 months ago

The importer pod takes long time to importing image to pvc (i.e data volume for vm ) , it takes around 40min to import image into pvc for 3Gb size image.

I tried to curl the same image inside the pod , it is taking only 18 sec. image

awels commented 2 months ago

You are probably running into https://github.com/kubevirt/containerized-data-importer/issues/2809 which is fixed in newer versions of CDI. Can you try the work around explained here and see if that makes it any faster?

lingesh00 commented 2 months ago

Hi @awels , I have tried with the 1.57 cdi image , it is showing diff error now . image

awels commented 2 months ago

So I am assuming your CRI is containerd, with the new versions of CDI we no longer run as root for security reasons and you need to configure your CRI to change the ownership of the block device accordingly when the pod starts. See this blog post on how to change the CRI settings to make it work.

lingesh00 commented 2 months ago

Hi @awels thanks , by updating CRI permission i was able create dv without any error and slowness for dv creation also reduced. And I'm having some doubt , in our production env there were nearly 100VM's running with the dv created using v1.49.0 cdi image , if i update the image to v1.57.0 will it affect any old dv , then also is any additional crd included in the updated version that need to be configured.

lingesh00 commented 2 months ago

Hi @awels , Previously i did fresh installation of cdi with v1.57.0 .But now i did upgrde from 1.49.0 to 1.57.0 ,after i updated the CDI image from v1.49.0 to v1.57.0 , the dv is stuck at pvc bound status and the importer pods were not comming . But the status of pvc was bound.

image

Yaml of CDI operator:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    meta.helm.sh/release-name: edgepass01
    meta.helm.sh/release-namespace: kubevm
  creationTimestamp: "2024-04-04T13:04:01Z"
  generation: 6
  labels:
    app.kubernetes.io/instance: edgepass01
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: cdi-operator
    app.kubernetes.io/version: 0.1.0
    helm.sh/chart: cdi-operator-0.1.0
    name: cdi-operator
    operator.cdi.kubevirt.io: ""
    prometheus.cdi.kubevirt.io: "true"
  name: edgepass01-cdi-operator-operator
  namespace: kubevm
  resourceVersion: "515413777"
  selfLink: /apis/apps/v1/namespaces/kubevm/deployments/edgepass01-cdi-operator-operator
  uid: e15d487c-e751-4532-b840-f8c1cf01e60e
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: edgepass01
      app.kubernetes.io/name: cdi-operator
      name: cdi-operator
      operator.cdi.kubevirt.io: ""
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: edgepass01
        app.kubernetes.io/name: cdi-operator
        name: cdi-operator
        operator.cdi.kubevirt.io: ""
        prometheus.cdi.kubevirt.io: "true"
    spec:
      containers:
      - env:
        - name: DEPLOY_CLUSTER_RESOURCES
          value: "true"
        - name: OPERATOR_VERSION
          value: v1.57.0
        - name: CONTROLLER_IMAGE
          value: quay.io/kubevirt/cdi-controller:v1.57.0
        - name: IMPORTER_IMAGE
          value: quay.io/kubevirt/cdi-importer:v1.57.0
        - name: CLONER_IMAGE
          value: quay.io/kubevirt/cdi-cloner:v1.57.0
        - name: APISERVER_IMAGE
          value: quay.io/kubevirt/cdi-apiserver:v1.57.0
        - name: UPLOAD_SERVER_IMAGE
          value: quay.io/kubevirt/cdi-uploadserver:v1.57.0
        - name: UPLOAD_PROXY_IMAGE
          value: quay.io/kubevirt/cdi-uploadproxy:v1.57.0
        - name: VERBOSITY
          value: "1"
        - name: PULL_POLICY
          value: IfNotPresent
        - name: MONITORING_NAMESPACE
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: cluster.local
        image: quay.io/kubevirt/cdi-operator:v1.57.0
        imagePullPolicy: IfNotPresent
        name: cdi-operator
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        resources:
          requests:
            cpu: 50m
            memory: 150Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
      serviceAccount: edgepass01-cdi-operator-operator
      serviceAccountName: edgepass01-cdi-operator-operator
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists

DV Yaml :

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
  name: test-cdi02
  namespace: test
spec:
  pvc:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 90Gi
    storageClassName: rook-ceph-block
    volumeMode: Block
  source:
    http:
      url: http://minio-service.kubevm.svc.cluster.local:9000/images/image_M1aMgs2N.img

Also i tried to downgrade the version from 1.57.0 to v1.49.0(previous version).It is not getting downgraded Logs of operator pod: image

lingesh00 commented 2 months ago

Also @awels , may i know the https://github.com/kubevirt/containerized-data-importer/issues/3146#issuecomment-2014942919 , this feature gate option in v1.22.9 k8s version .Because i'm getting pvc annotation errror while dv creation.

image