vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.58k stars 1.39k forks source link

resource restore error: error restoring targetgroupbindings.elbv2.k8s.aws #7387

Open sureshbanyal opened 7 months ago

sureshbanyal commented 7 months ago

What steps did you take and what happened: I have installed velero using helm v1.13 by enabling --use-node-agent in my eks cluster my aws plugin is v1.9.0 i have taken backup for oneof the namespace in my eks cluster i deleted and try to restored using velero but after restore my target group rules were not getiing healthy my domain is showing 502badgateway error please help me to resolve this issue.

What did you expect to happen: Velero has to restore successfully my namespace and its custom resource data like PVC, PODs, S EVERYTHING INSIDE NAMESPACE The following information will help us better understand what's going on: bundle-2024-02-05-16-18-34.zip

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 7 months ago

From the log, Velero server restarted after the restore, so Velero server log doesn't include the cause of the PartiallyFailed restore. And looks like the client access relies on the root ns resources which are being restored, so the client cannot connect to the cluster to create download request for restore log and restore describe.

@sureshbanyal Please run the restore and collect the log bundle again, so that we know why the restore gets PartiallyFailed.

sureshbanyal commented 7 months ago

Thanks for responding @Lyndon-Li here is another restore bundle please help me, For this namespace also i face same issue bundle-2024-02-06-06-07-44.zip

Please let me know if need any more information to resolve issue thanks for help.

Lyndon-Li commented 7 months ago

Here is the log related to the error. Looks like the resource targetgroupbindings.elbv2.k8s.aws has a corresponding webhook. And it is a known issue that Velero restore doesn't support webhook.

error restoring targetgroupbindings.elbv2.k8s.aws/sureshveltest-ns/k8s-sureshve-sureshve-274d827351: admission webhook "vtargetgroupbinding.elbv2.k8s.aws" denied the request: unable to get target group IP address type: TargetGroupNotFound: One or more target groups not found
  status code: 400, request id: 01eb6077-4c2d-4915-8e25-2277b88e81a6
sureshbanyal commented 7 months ago

Thank you so much for conforming that velero restore will doesn't support webhook how can we resolve this, Is there any solution to bring up my application again live after restoring using velero.

sureshbanyal commented 7 months ago

@Lyndon-Li Could you please provide the issue number for the mentioned topic? It would help us track and address it efficiently. Thanks!

Lyndon-Li commented 7 months ago

@sureshbanyal See this known issue/limitation

snandam commented 4 months ago

I'm running into the same issue. Appreciate any help with this.

blackpiglet commented 3 months ago

Try to disable the webhook then do the restore.

douglasqsantos commented 2 months ago

@snandam you can just add --exclude-resources targetgroupbindings to your restore command line.

e.g

velero restore create --from-backup nginx-backup --exclude-resources targetgroupbindings
Restore request "nginx-backup-20240622000132" submitted successfully.
Run `velero restore describe nginx-backup-20240622000132` or `velero restore logs nginx-backup-20240622000132` for more details

The output will be like

velero restore describe nginx-backup-20240622000132
Name:         nginx-backup-20240622000132
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  11
Items restored:              11

Started:    2024-06-22 00:01:32 -0300 -03
Completed:  2024-06-22 00:01:34 -0300 -03

Warnings:
  Velero:     <none>
  Cluster:  could not restore, CustomResourceDefinition "targetgroupbindings.elbv2.k8s.aws" already exists. Warning: the in-cluster version is different than the backed-up version
  Namespaces:
    nginx-example:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version

Backup:  nginx-backup

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        targetgroupbindings, nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Or label selector:  <none>

Restore PVs:  auto

CSI Snapshot Restores: <none included>

Existing Resource Policy:   <none>
ItemOperationTimeout:       4h0m0s

Preserve Service NodePorts:  auto

Uploader config:

HooksAttempted:   0
HooksFailed:      0