kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
427 stars 268 forks source link

Consider changing import strategy to work better with persistent volume quotas #3511

Open jrcichra opened 2 weeks ago

jrcichra commented 2 weeks ago

Is your feature request related to a problem? Please describe: In multi-tenant clusters with storage quotas, importing a virtual machine adds additional friction to the user experience due to the temporary amount of persistent storage required. When importing a virtual machine on v1.60.3 with volume mode block (as to not add filesystem overhead to the equation) the process temporarily takes 3x the storage requested on the dataVolume to complete the import.

For example, with the given quota in an otherwise empty namespace:

default-zero-quota   2m1s   count/poddisruptionbudgets.policy: 0/0, count/volumesnapshots.snapshot.storage.k8s.io: 0/0, requests.nvidia.com/gpu: 0/0, services.loadbalancers: 0/0   
quota                2s     count/jobs.batch: 0/100, persistentvolumeclaims: 0/5, pods: 0/64, requests.cpu: 0/5, requests.memory: 0/5Gi, requests.storage: 0/100Gi                                                                limits.cpu: 0/10, limits.memory: 0/10Gi

And the following YAML to import a 20Gi Debian Cloud VM:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-debian-image-download
spec:
  egress:
    - ports:
        - port: 443
  podSelector:
    matchLabels:
      app: containerized-data-importer
  policyTypes:
    - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-debian-vm-internet
spec:
  egress:
    - ports:
        - port: 80
  podSelector:
    matchLabels:
      kubevirt.io/vm: debian
  policyTypes:
    - Egress
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: debian
  name: debian
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: debian
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: debian-disk
          - disk:
              bus: virtio
            name: cloudinit-disk
        resources:
          limits:
            cpu: 4000m
            memory: 4Gi
      volumes:
      - dataVolume:
          name: debian-disk
        name: debian-disk
      - name: cloudinit-disk
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            chpasswd:
              list: |
                debian:debian
              expire: False
  dataVolumeTemplates:
  - metadata:
      name: debian-disk
    spec:
      storage:
        volumeMode: Block
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 20Gi
      source:
        http:
          url: https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2

The following quota is used during the import:

default-zero-quota   6m11s   count/poddisruptionbudgets.policy: 0/0, count/volumesnapshots.snapshot.storage.k8s.io: 0/0, requests.nvidia.com/gpu: 0/0, services.loadbalancers: 0/0   
quota                4m12s   count/jobs.batch: 0/100, persistentvolumeclaims: 3/5, pods: 1/64, requests.cpu: 100m/5, requests.memory: 60M/5Gi, requests.storage: 64424509440/100Gi                                                 limits.cpu: 1/10, limits.memory: 500Mi/10Gi

This is because 3 volumes are made during the import process:

NAME                                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     VOLUMEATTRIBUTESCLASS   AGE
debian-disk                                          Pending                                                                        rook-ceph-nvme   <unset>                 61s
prime-fe3207de-d67c-4c42-a95b-d7c59c8b2401           Bound     pvc-747076a3-afb4-4793-be1c-a1751ce99f03   20Gi       RWO            rook-ceph-nvme   <unset>                 61s
prime-fe3207de-d67c-4c42-a95b-d7c59c8b2401-scratch   Bound     pvc-f9552aec-4435-440a-bf7d-76f77af30b4e   20Gi       RWO            rook-ceph-nvme   <unset>                 48s

Would it be possible to reduce this down to 1 or 2 volumes?

Describe the solution you'd like: Some ideas I have to do this:

  1. Give the option for scratch space to exist on emptyDir. I read though the scratch space doc and I understand the reasoning behind it. But it would be nice to have the option to use emptyDir and take the risk of not having enough space to avoid a PVC allocation. Our storage class is backed by Rook RBD so it's a ReadWriteOnce only setup.
  2. Reduce the number of active PVCs to 2. Can we wait on creating debian-disk until we've destroyed prime-fe3207de-d67c-4c42-a95b-d7c59c8b2401? Without doing #1 that would reduce the number of volumes to 2. So users would only need to ask for 2x the quota instead of 3x.

Regardless of the implementation, the ideal end-state would be to have some path where users can ask for 20Gi of storage quota and do a CDI import that only requires 20Gi of storage.

Describe alternatives you've considered: Right now users have to request 3x the amount of storage they need, and sometimes a bit more than 3x if they are using the Filesystem mode with the default overhead.

aglitke commented 4 days ago

Another option could be to provision in a cdi owned "system" populator namespace where a higher, shared storage quota could be set. As a final step, the prepared PV would be bound to the target PVC in the target namespace. Would something like this solve the problem you are seeing? There are some potential security issues with this approach which is why we didn't implement it in this way from the beginning.

jrcichra commented 2 days ago

I believe so. In that case we'd have some generic storage quota set aside for these operations, within namespace cdi would be ok. If we ran out of disk quota in cdi then it'd just queue up the CDI imports.

That method would also be a better user experience.