Closed lklkxcxc closed 1 year ago
time="2023-03-01T10:50:01Z" level=info msg="Backup starting" backup=velero/harbor-20230301184815 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:191" name=harbor-20230301184815-chhfg namespace=velero time="2023-03-01T10:50:01Z" level=info msg="Looking for most recent completed pod volume backup for this PVC" backup=velero/harbor-20230301184815 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:340" name=harbor-20230301184815-chhfg namespace=velero pvcUID=2dcb3d8b-2023-4600-8b74-2942ddd04259 time="2023-03-01T10:50:01Z" level=info msg="No completed pod volume backup found for PVC" backup=velero/harbor-20230301184815 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:370" name=harbor-20230301184815-chhfg namespace=velero pvcUID=2dcb3d8b-2023-4600-8b74-2942ddd04259 time="2023-03-01T10:50:01Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=velero/harbor-20230301184815 controller=pod-volume-backup logSource="pkg/controller/pod_volume_backup_controller.go:277" name=harbor-20230301184815-chhfg namespace=velero
Looks like the stdout from Restic command has been truncated somehow: {\"message_type\":\"status\",\"seconds_elapsed\":1856,\"percent_done\":0.03797256558287685,\"total_files\":28583,\"files_done\":1129,\"total_bytes\":742600171549,\"bytes_done\":28198433716,\"current_files\":[\"/docker/registry/v2/blobs/sha256/02/02f8a685e66f6cbd60f0ff2b11
@Lyndon-Li I backup 800 GB harbor registry , above error because restic timeout. The problem was resolved when I had tried to add restic timeout .But restore job until not complete ,in progress now 100% ,now 98.%.
For large data size, the file system backup (a.k.a. restic backup formerly) needs to take some time and some resources, especially memory resource.
If the memory resource is not enough, restic pod will be killed by Kubernetes due to OOM.
I am not sure if this happened in the current env or not, if this happened, the below problem may be caused by the kill:
level=error msg="error getting restic backup progress"
The story is like restic process was killed first, so Velero get an incomplete output.
@lklkxcxc Could you help to confirm if OOM ever happened in the env? Meanwhile, I see you are using Velero 1.7, I suggest you upgrade to 1.10, where the Kopia path is shipped and Kopia path performs better when backing up data with large sizes. For how to use Kopia path for file system backup in 1.10, refer to this doc
I did not found OOM ever happened in the env,thans!
What steps did you take and what happened:
Backup volume hung and Items backed constant progress.
Check restic log report error:
What did you expect to happen:
The backup job completed.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero:
velero backup describe:
Phase: InProgress
Errors: 0 Warnings: 0
Namespaces: Included: * Excluded:
Resources: Included: * Excluded:
Cluster-scoped: auto
Label selector:
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks:
Backup Format Version: 1.1.0
Started: 2023-03-01 18:48:15 +0800 CST Completed: <n/a>
Expiration: 2023-03-31 18:48:15 +0800 CST
Estimated total items to be backed up: 1669 Items backed up so far: 82
Velero-Native Snapshots:
Restic Backups (specify --details for more information): Completed: 3 In Progress: 1