Open ffzzhong opened 2 months ago
I checked the debug bundle you provided, in the restore venti-airflow-20240420-2-20240421135403
the only error was
time="2024-04-21T05:55:07Z" level=error msg="error patch for managed fields default/venti-airflow-postgresql-0: Timeout: request did not complete within requested timeout - context deadline exceeded" logSource="pkg/restore/restore.go:1731" restore=velero/venti-airflow-20240420-2-20240421135403
Could you check if all the resources are restored as expected and the data populated in this particular restore?
Could you please also check for other restores that were failed to complete are you seeing the same error?
@reasonerjt
this issue happens randomly and kinda suddenly, when I raised this issue, it indeed didn't work, no matter how small the backup is and how long I wait. the only error I see in the restore log is the one saying timeout - context deadline exceeded" logSource="pkg/controller/restore_controller.go:567
But, some time after I raised the PR, since I was keeping trying to restore, all of a sudden again, everything works as normal. and I was able to restore even the biggest backup(around 200G).
Then comes to today, things go wrong again, same issue, time out, for any restore
for the questions you're asking:
velero get backup
, I see the status becomes PartiallyFailed
very soon, and the data in the PV never gets populated, when this issue happens, even for a very small amount of backup, I always get the same error, for the error timeout - context deadline exceeded" logSource="pkg/controller/restore_controller.go:567
, in Velero version v1.12.2, the code is https://github.com/vmware-tanzu/velero/blob/v1.12.2/pkg/controller/restore_controller.go#L565-L569, seems it's trying to collect some info from the namespace and it's failed? it's an error not a warning
but I do see the resources are created in the desired namespace, what makes it an error?
is it because of somehow my k8s API server is slow? as I mentioned, our cluster is a hybrid cluster, part of API calls go to the machines on the cloud, but
What steps did you take and what happened:
This issue happens kinda suddenly.
Velero 1.11.0
andAWS provider plugin v1.7.1
, uploader type use default restic.PartiallyFailed
with 1 outstanding error:level=error msg="Namespace default, resource restore error: Timeout: request did not complete within requested timeout - context deadline exceeded" logSource="pkg/controller/restore_controller.go
, and in the restore pod, the initContainerrestore-wait
always saysNot found: /restores/data/.velero/xxxx-xxxx-xxxx-xxxx
what I tried:
1.12.2
and upgrade AWS provider plugin to1.8.2
accordingly, restore failedso seems the backup is OK but the restore indeed has some issue, but I'm not sure what's the root cause of
requested timeout
andnot found: /restore/data
, I checked my cluster API server response time it's actually acceptable. I also don't understand why even I try to restore a previously successful backup, still leads to a failure.What did you expect to happen: the restore should work, at least, should work for those backups with small size(they worked before)
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
yes, I'm using Velero 1.11.0 initially, but then I upgrade it to 1.12.2 trying to workaround, but still no luck, I will attach the debug info from 1.12.2 bundle-2024-04-21-14-35-13.tar.gzIf you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
velero version
): tried 1.11.0, 1.12.2velero client config get features
): features:kubectl version
): v1.25.6/etc/os-release
): ubuntu 22.04Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.