openshift / openshift-velero-plugin

General Velero plugin for backup and restore of openshift workloads.
Apache License 2.0
47 stars 37 forks source link

Restore skipped if ReplicaSet owned by a deployment #170

Closed jxl4650152 closed 1 year ago

jxl4650152 commented 1 year ago

I have one deployment with three pods in the OCP cluster. After restoring from the backup, six pods were created. 3 of them are owned by a ReplicaSet and the other 3 pods have no owner.

According to the doc, restoration of the replicaset is skipped if owned by a deployment. In this situation, Velero will restore all three pods in the backup and the deployment, which will create a replicaset that has 3 replicas. Is this correct or I missed something config?

sseago commented 1 year ago

It should also skip the pods with an owner, unless you're using Restic (and the pod has PVCs) or you have post-restore hooks on the pod. It sounds like there may be a bug in the restore plugins, though, as we shouldn't see the plugin duplication. First thing is to determine whether you're using restic and/or restore hooks. Then, it might be helpful to see deployment/replicaset/pod yaml prior to backup and also post-restore, as well as backup/restore yaml -- this would help us figure out whether we're dealing with any edge cases here.

jxl4650152 commented 1 year ago

Thanks for the reply. As you said, Restic is used and the pod has PVCs. I have tried to skip the pods when restoring but in that case, the content of the PV is not restored.

jxl4650152 commented 1 year ago

We're using restic without restore hooks.

Backup:

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.23.5+012e945
    velero.io/source-cluster-k8s-major-version: '1'
    velero.io/source-cluster-k8s-minor-version: '23'
  resourceVersion: '1137186589'
  name: backup2
  namespace: openshift-adp
  labels:
    velero.io/storage-location: velero-sample-1
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToRestic: true
  includedNamespaces:
    - marketplace
    - eco-common
  storageLocation: velero-sample-1
  ttl: 240h0m0s

Restore:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restorecommon3
  namespace: openshift-adp
spec:
  backupName: backup2
  excludedResources:
    - ep
    - nodes
    - events
    - events.events.k8s.io
    - backups.velero.io
    - restores.velero.io
    - resticrepositories.velero.io
  includeClusterResources: true
  includedNamespaces:
    - eco-common

Deployment after restored:

kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
    deployment.kubernetes.io/revision: '1'
    meta.helm.sh/release-name: eco-helm-repo
    meta.helm.sh/release-namespace: eco-common
    openshift.io/backup-registry-hostname: 'image-registry.openshift-image-registry.svc:5000'
    openshift.io/backup-server-version: '1.23'
    openshift.io/restore-registry-hostname: 'image-registry.openshift-image-registry.svc:5000'
    openshift.io/restore-server-version: '1.23'
  resourceVersion: '1173068951'
  name: eco-helm-repo-chartmuseum
  uid: ec31aeec-211d-4dd3-903a-b53def38c5e7
  creationTimestamp: '2022-11-23T01:01:55Z'
  generation: 1
  managedFields:
    - manager: velero-server
      operation: Update
      apiVersion: apps/v1
      time: '2022-11-23T01:01:55Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:meta.helm.sh/release-name': {}
            'f:meta.helm.sh/release-namespace': {}
            'f:openshift.io/backup-registry-hostname': {}
            'f:openshift.io/backup-server-version': {}
            'f:openshift.io/restore-registry-hostname': {}
            'f:openshift.io/restore-server-version': {}
          'f:labels':
            .: {}
            'f:app': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:chart': {}
            'f:heritage': {}
            'f:release': {}
            'f:velero.io/backup-name': {}
            'f:velero.io/restore-name': {}
        'f:spec':
          'f:progressDeadlineSeconds': {}
          'f:replicas': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:strategy':
            'f:rollingUpdate':
              .: {}
              'f:maxSurge': {}
              'f:maxUnavailable': {}
            'f:type': {}
          'f:template':
            'f:metadata':
              'f:labels':
                .: {}
                'f:app': {}
                'f:release': {}
                'f:resource-name': {}
              'f:name': {}
            'f:spec':
              'f:volumes':
                .: {}
                'k:{"name":"storage-volume"}':
                  .: {}
                  'f:name': {}
                  'f:persistentVolumeClaim':
                    .: {}
                    'f:claimName': {}
              'f:containers':
                'k:{"name":"chartmuseum"}':
                  'f:image': {}
                  'f:volumeMounts':
                    .: {}
                    'k:{"mountPath":"/storage"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                  'f:terminationMessagePolicy': {}
                  .: {}
                  'f:resources': {}
                  'f:args': {}
                  'f:livenessProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:httpGet':
                      .: {}
                      'f:path': {}
                      'f:port': {}
                      'f:scheme': {}
                    'f:initialDelaySeconds': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:timeoutSeconds': {}
                  'f:env':
                    'k:{"name":"BASIC_AUTH_USER"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:secretKeyRef': {}
                    'k:{"name":"LOG_JSON"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"BASIC_AUTH_PASS"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:secretKeyRef': {}
                    'k:{"name":"TZ"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"STORAGE"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    .: {}
                    'k:{"name":"CHART_POST_FORM_FIELD_NAME"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"PROV_POST_FORM_FIELD_NAME"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"DISABLE_METRICS"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                  'f:readinessProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:httpGet':
                      .: {}
                      'f:path': {}
                      'f:port': {}
                      'f:scheme': {}
                    'f:initialDelaySeconds': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:timeoutSeconds': {}
                  'f:securityContext': {}
                  'f:terminationMessagePath': {}
                  'f:imagePullPolicy': {}
                  'f:ports':
                    .: {}
                    'k:{"containerPort":8080,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                  'f:name': {}
              'f:dnsPolicy': {}
              'f:tolerations': {}
              'f:restartPolicy': {}
              'f:schedulerName': {}
              'f:terminationGracePeriodSeconds': {}
              'f:securityContext':
                .: {}
                'f:fsGroup': {}
              'f:affinity':
                .: {}
                'f:nodeAffinity':
                  .: {}
                  'f:requiredDuringSchedulingIgnoredDuringExecution': {}
                'f:podAntiAffinity':
                  .: {}
                  'f:preferredDuringSchedulingIgnoredDuringExecution': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2022-11-23T01:02:05Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:deployment.kubernetes.io/revision': {}
        'f:status':
          'f:availableReplicas': {}
          'f:conditions':
            .: {}
            'k:{"type":"Available"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Progressing"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:lastUpdateTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
          'f:observedGeneration': {}
          'f:readyReplicas': {}
          'f:replicas': {}
          'f:updatedReplicas': {}
      subresource: status
  namespace: eco-common
  labels:
    app: chartmuseum
    app.kubernetes.io/managed-by: Helm
    chart: chartmuseum-2.14.0
    heritage: Helm
    release: eco-helm-repo
    velero.io/backup-name: backup2
    velero.io/restore-name: restorecommon3
spec:
  replicas: 3
  selector:
    matchLabels:
      app: chartmuseum
      release: eco-helm-repo
  template:
    metadata:
      name: eco-helm-repo-chartmuseum
      creationTimestamp: null
      labels:
        app: chartmuseum
        release: eco-helm-repo
        resource-name: eco-helm-repo-chartmuseum
    spec:
      restartPolicy: Always
      schedulerName: default-scheduler
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/ecocloud
                    operator: In
                    values:
                      - 'true'
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: chartmuseum
                    release: eco-helm-repo
                topologyKey: failure-domain.beta.kubernetes.io/zone
      terminationGracePeriodSeconds: 30
      securityContext:
        fsGroup: 1000
      containers:
        - resources: {}
          readinessProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: chartmuseum
          livenessProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: CHART_POST_FORM_FIELD_NAME
              value: chart
            - name: DISABLE_METRICS
              value: 'true'
            - name: LOG_JSON
              value: 'true'
            - name: PROV_POST_FORM_FIELD_NAME
              value: prov
            - name: STORAGE
              value: local
            - name: TZ
              value: CST-8
            - name: BASIC_AUTH_PASS
              valueFrom:
                secretKeyRef:
                  name: eco-helm-repo-chartmuseum
                  key: BASIC_AUTH_PASS
            - name: BASIC_AUTH_USER
              valueFrom:
                secretKeyRef:
                  name: eco-helm-repo-chartmuseum
                  key: BASIC_AUTH_USER
          securityContext: {}
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: storage-volume
              mountPath: /storage
          terminationMessagePolicy: File
          image: >-
            image-registry.openshift-image-registry.svc:5000/eco-common/chartmuseum:v0.12.0
          args:
            - '--port=8080'
            - '--storage-local-rootdir=/storage'
      volumes:
        - name: storage-volume
          persistentVolumeClaim:
            claimName: eco-helm-repo-chartmuseum
kaovilai commented 1 year ago

I have tried to skip the pods when restoring but in that case, the content of the PV is not restored.

That is expected. Restic requires a pod mounting the PVC to be in the restore for PV data to be restored.

openshift-bot commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale