vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.57k stars 1.38k forks source link

velero backup es error: Warning: at least one source file could not be read\n: exit status 3 #7363

Closed junjie9021 closed 7 months ago

junjie9021 commented 7 months ago

What steps did you take and what happened: I am backing up Elasticsearch and have stopped writing data.it contains 2 PVs.

image image

What did you expect to happen: Normally, the backup status should be successful. Because disk I/O has been frozen

The following information will help us better understand what's going on: velero install cmd

velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.7.0 --bucket xxxxxx-velero --secret-file ./credentials-velero --backup-location-config region=ap-southeast-1 --snapshot-location-config region=ap-southeast-1  --use-restic 

velero backup cmd

velero backup create football-pre-middleware --include-namespaces=football-pre-middleware --default-volumes-to-restic --snapshot-volumes=true

velero backup describe football-pre-middleware

Name:         football-pre-middleware
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.22.17-eks-8cb36c9
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=22+

Phase:  PartiallyFailed (run `velero backup logs football-pre-middleware` for more information)

Errors:    1
Warnings:  0

Namespaces:
  Included:  football-pre-middleware
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-01-29 06:21:41 +0000 UTC
Completed:  2024-01-29 06:28:59 +0000 UTC

Expiration:  2024-02-28 06:21:41 +0000 UTC

Total items to be backed up:  153
Items backed up:              153

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
  Completed:  17
  Failed:     1

podvolumebackups Failed

image

kubectl get podvolumebackups -n velero football-pre-middleware-qdqjd -oyaml

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: elasticsearch-master-elasticsearch-master-1
  creationTimestamp: "2024-01-29T06:22:21Z"
  generateName: football-pre-middleware-
  generation: 5
  labels:
    velero.io/backup-name: football-pre-middleware
    velero.io/backup-uid: f0214890-468f-4a80-a4e2-db0577ed575c
    velero.io/pvc-uid: fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4
  name: football-pre-middleware-qdqjd
  namespace: velero
  ownerReferences:
  - apiVersion: velero.io/v1
    controller: true
    kind: Backup
    name: football-pre-middleware
    uid: f0214890-468f-4a80-a4e2-db0577ed575c
  resourceVersion: "176663175"
  uid: 20ecbd0a-cd72-43b2-8509-57bed8fadcc9
spec:
  backupStorageLocation: default
  node: ip-10-50-100-162.ap-southeast-1.compute.internal
  pod:
    kind: Pod
    name: elasticsearch-master-1
    namespace: football-pre-middleware
    uid: 4486dbe8-f0cf-49c3-aa94-8adca8f834cb
  repoIdentifier: s3:s3-ap-southeast-1.amazonaws.com/xxxxxx-velero/restic/football-pre-middleware
  tags:
    backup: football-pre-middleware
    backup-uid: f0214890-468f-4a80-a4e2-db0577ed575c
    ns: football-pre-middleware
    pod: elasticsearch-master-1
    pod-uid: 4486dbe8-f0cf-49c3-aa94-8adca8f834cb
    pvc-uid: fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4
    volume: elasticsearch-master
  volume: elasticsearch-master
status:
  completionTimestamp: "2024-01-29T06:22:51Z"
  message: |-
    running Restic backup, stderr={"message_type":"error","error":{"Op":"lstat","Path":"nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96.fdm","Err":2},"during":"archival","item":"/host_pods/4486dbe8-f0cf-49c3-aa94-8adca8f834cb/volumes/kubernetes.io~aws-ebs/pvc-fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4/nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96.fdm"}
    {"message_type":"error","error":{"Op":"lstat","Path":"nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96.fdt","Err":2},"during":"archival","item":"/host_pods/4486dbe8-f0cf-49c3-aa94-8adca8f834cb/volumes/kubernetes.io~aws-ebs/pvc-fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4/nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96.fdt"}
    {"message_type":"error","error":{"Op":"lstat","Path":"nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96_Lucene85FieldsIndex-doc_ids_917es.tmp","Err":2},"during":"archival","item":"/host_pods/4486dbe8-f0cf-49c3-aa94-8adca8f834cb/volumes/kubernetes.io~aws-ebs/pvc-fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4/nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96_Lucene85FieldsIndex-doc_ids_917es.tmp"}
    {"message_type":"error","error":{"Op":"lstat","Path":"nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96_Lucene85FieldsIndexfile_pointers_917et.tmp","Err":2},"during":"archival","item":"/host_pods/4486dbe8-f0cf-49c3-aa94-8adca8f834cb/volumes/kubernetes.io~aws-ebs/pvc-fa1b5a5d-c3b4-4d86-89b3-2d20be0302a4/nodes/0/indices/_nNdZVaQT_23vEY764XoJg/0/index/_ahj96_Lucene85FieldsIndexfile_pointers_917et.tmp"}
    Warning: at least one source file could not be read
    : exit status 3
  phase: Failed
  progress:
    bytesDone: 532135617
    totalBytes: 803027853
  startTimestamp: "2024-01-29T06:22:21Z"

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 7 months ago

Velero file-system backup (formerly restic backup) is not consistent, therefore, if the volume data is frequently changed, some file may be failed to backup or the backed up file data is not consistent. To solve the problem, please upgrade Velero to 1.12 or higher and use CSI snapshot data movement backup method.

danfengliu commented 7 months ago

@junjie9021 Are you sure you had frozen disk I/O of the PV? Did you monitor the disk I/O or what the move you took to stop writing data to that PV?

junjie9021 commented 7 months ago

@junjie9021 Are you sure you had frozen disk I/O of the PV? Did you monitor the disk I/O or what the move you took to stop writing data to that PV?

I only stopped the write service for logstash. read is still normal.

junjie9021 commented 7 months ago

Then I tried again, and this backup was fine with no errors, and cross cluster restore was also normal.