vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.4k stars 1.37k forks source link

v1.14.0 Backup error when emptyDir pod volumes are attached #7929

Open phoenix-bjoern opened 3 days ago

phoenix-bjoern commented 3 days ago

What steps did you take and what happened: Backing up a StatefulSet with an emptyDir attached reports an error during backup:

message: /fail to get PVC for pod integration/redis-master-0 error: /volume integration/redis-tmp-conf has no PVC associated with it
message: /fail to get PVC for pod integration/redis-master-0 error: /volume integration/tmp has no PVC associated with it
message: /fail to get PVC for pod integration/redis-master-0 error: /volume integration/redis-data has no PVC associated with it
name: /redis-master-0 message: /Error backing up item error: /volume integration/redis-tmp-conf has no PVC associated with it
name: /redis-master-0 message: /Error backing up item error: /volume integration/tmp has no PVC associated with it
name: /redis-master-0 message: /Error backing up item error: /volume integration/redis-data has no PVC associated with it

This is the (shortend) resource YAML:

apiVersion: v1
kind: Pod
metadata:
  ...
  name: redis-master-0
  namespace: integration
spec:
  containers:
    - args:
        - '-c'
        - /opt/bitnami/scripts/start-scripts/start-master.sh
      command:
        - /bin/bash
      env:
        - name: BITNAMI_DEBUG
          value: 'false'
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: 'yes'
        - name: REDIS_TLS_ENABLED
          value: 'no'
        - name: REDIS_PORT
          value: '6379'
      image: docker.io/bitnami/redis:6.2.7-debian-11-r11
      imagePullPolicy: IfNotPresent
      ...
      volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-lpnb6
          readOnly: true
  ...
  volumes:
    - configMap:
        defaultMode: 493
        name: redis-scripts
      name: start-scripts
    - configMap:
        defaultMode: 493
        name: redis-health
      name: health
    - configMap:
        defaultMode: 420
        name: redis-configuration
      name: config
    - emptyDir: {}
      name: redis-tmp-conf
    - emptyDir: {}
      name: tmp
    - emptyDir: {}
      name: redis-data

What did you expect to happen:

The code to extend the volume policies seems to expect a PVC: https://github.com/vmware-tanzu/velero/blob/c827fd0c6b55aa0068c6d99024c6ca5eca78ffa3/pkg/util/kube/pvc_pv.go#L392

Reporting a missing PVC for an emptyDir doesn't make sense. Instead emptyDirs should be skipped by the code.

The following information will help us better understand what's going on:

The new Volume Policy example provides a snippet for emptyDirs: https://velero.io/docs/main/resource-filtering/#yaml-template

So adding this to a Volume Policy resolves the issue:

- conditions:
    volumeTypes:
      - emptyDir
      - configmap
  action:
    type: skip

Maybe it is intended to make this mandatory. But I can not imagine a situation where it makes sense to actually backup an emptyDir. And adding "ConfigMap" is non-sense, as ConfigMaps are resources which are backed up by Velero anyway (if not excluded explicitly).

If the core team decides that every user should add this condition to their VolumePolicies, I'm fine with closing this issue. But maybe this mandatory change should be mentioned in the release notes then. However, I would propose to not throw an error for emptyDir volumes as such scratch volumes are never be expected to be backed up.

blackpiglet commented 3 days ago

What's the version of Velero you are using?

phoenix-bjoern commented 3 days ago

@blackpiglet Velero v1.14.0 as mentioned in the title ;-)

phoenix-bjoern commented 3 days ago

I apologize, I was wrong in regards to the volume policy (maybe I was misguided by the documentation here):

The problem is actually in the shouldIncludeVolumeInBackup where volume types like hostPath, Secrets and ConfigMaps get excluded, but not emptyDir: https://github.com/vmware-tanzu/velero/blame/c827fd0c6b55aa0068c6d99024c6ca5eca78ffa3/internal/volumehelper/volume_policy_helper.go#L214 The volume policy for the types does not apply as the util method GetPVCForPodVolume instantly returns an error ( https://github.com/vmware-tanzu/velero/blame/c827fd0c6b55aa0068c6d99024c6ca5eca78ffa3/internal/volumehelper/volume_policy_helper.go#L146) and the method exits.

So, IMHO this is a bug as an emptyDir will never have a PVC. The bug can be resolved by extending shouldIncludeVolumeInBackup and add smth like this:

    // cannot backup emptyDir volumes as they are not mounted into /var/lib/kubelet/pods
    // and therefore not accessible to the node agent daemon set.
    if vol.EmptyDir != nil {
        includeVolumeInBackup = false
    }
reitermarkus commented 20 hours ago

Also seeing the same errors on 0.14.0 when using a resource policy.