vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.8k stars 1.41k forks source link

Fail to run E2E test with Velero v1.8 on GKE environment #4505

Closed blackpiglet closed 1 year ago

blackpiglet commented 2 years ago

What steps did you take and what happened:

CREDS_FILE=/root/credentials-velero \ BSL_BUCKET=jxun \ CLOUD_PROVIDER=gcp \ VELERO_CLI=/usr/local/bin/velero \ VELERO_VERSION=main \ VELERO_IMAGE=velero/velero:main \ VELERO_NAMESPACE=velero \ REGISTRY_CREDENTIAL_FILE=/root/docker-config.json \ ADDITIONAL_OBJECT_STORE_PROVIDER=aws \ ADDITIONAL_CREDS_FILE=/root/aws_credentials \ ADDITIONAL_BSL_BUCKET=mqiu-bucket \ GINKGO_SKIP=Restic \ nohup make test-e2e &

Test failed with 4 cases not passed.

What did you expect to happen: Test pass without failed case.

The following information will help us better understand what's going on:

E2E tes log: https://gist.github.com/blackpiglet/5d21e0cde12ddb294d7bae8eb6e7bf7f

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

blackpiglet commented 2 years ago

There are several issues:

  1. Some resources created during tests are not cleared when test finishes, e.g. some namespaces, backups and restores in S3.
  2. [Snapshot] test case is ran by consecutively two times. The second run is failed.
  3. [Upgrade][Snapshot] test case fails due to cannot create namespace upgrade-workload. This may be related to previous failed case [Snapshot].
  4. [Backups][Deletion] fails to find created "velero" namespace.
  5. [ResourceFiltering][IncludeNamespaces] fails to restore previous created backup.

Another issue found during E2E test is: [Basic] test case occasionally fails on the namespace annotation case. The error says cannot find the created backup on S3. After clear backup directory on S3 bucket used by BSL, the test case passes again.

After getting help from Danfeng, I removed test case [Backups][Deletion], due to not applicable for GCP environment, and modified the out-of-date AWS secret info. Then all case should pass

blackpiglet commented 2 years ago

[Basic] namespace with annotation case failure is reproduced. Log is https://gist.github.com/blackpiglet/58dad656717365e364fed7863d639fd3

In log, the error is fail to find created backup, but, after test completed, install velero and retrieve backup list, the missed backup can be found in the list.

velero install --provider gcp --plugins velero/velero-plugin-for-gcp:main --image velero/velero:main --bucket jxun --secret-file /root/credentials-velero

velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR backup-0c6157ab-9c92-4608-ace2-087f904dfa55 Completed 0 0 2022-01-08 05:19:03 +0000 UTC 29d default backup-31396c5a-3d4b-4c88-945c-a5fd4adb3142 Completed 0 0 2022-01-08 03:35:02 +0000 UTC 29d default backup-34c60abb-9af4-4565-8d03-64969e9b7e4c Completed 0 0 2022-01-08 07:22:55 +0000 UTC 29d default backup-395e9781-109e-4e9c-9cad-1f7b9a58419d Completed 0 0 2022-01-08 04:04:36 +0000 UTC 29d default backup-4c990499-9462-446a-8993-da8a87b74d08 Completed 0 0 2022-01-08 03:56:57 +0000 UTC 29d default backup-5f3eaff7-1663-44f4-9835-d441617eba64 Completed 0 0 2022-01-08 03:47:33 +0000 UTC 29d default backup-7d81948b-ef6f-4e03-a73e-527bf783ded7 Completed 0 0 2022-01-08 09:01:11 +0000 UTC 29d default backup-7f64e720-74dc-45ce-8f49-40242736bc1b Completed 0 0 2022-01-08 08:59:26 +0000 UTC 29d default backup-8665c4fd-91a6-405f-87f4-4979ec5c05a9 Completed 0 0 2022-01-08 03:24:01 +0000 UTC 29d default backup-95940c73-23f0-49a4-8edc-1db37ccfe14a Completed 0 0 2022-01-08 07:47:09 +0000 UTC 29d default backup-9c089e89-3c1d-4a9a-90e7-8204b54ed070 Completed 0 0 2022-01-08 07:33:53 +0000 UTC 29d default backup-default-4c3cd21f-35f3-4ee4-8178-54e549b3cd3a Completed 0 0 2022-01-08 08:49:42 +0000 UTC 29d default backup-default-dcd8efc5-2dd8-4b25-8271-f1256d3a0e6f Completed 0 0 2022-01-08 04:09:07 +0000 UTC 29d default backup-eaa9e696-ea8e-4722-835c-a84a54a635ee Completed 0 0 2022-01-08 07:58:17 +0000 UTC 29d default backup-eafc0a66-a840-475b-8692-1bc9ee333be9 Completed 0 0 2022-01-08 08:44:11 +0000 UTC 29d default backup-exclude-from-backup-3c457d06-41fb-44b7-a796-ad4941941b96 Completed 0 0 2022-01-08 03:36:45 +0000 UTC 29d default backup-exclude-from-backup-9d521361-f6e8-41a7-9f86-95fc2e6167f5 Completed 0 0 2022-01-08 07:29:04 +0000 UTC 29d default backup-exclude-namespaces-12fcdfe9-210a-4546-a7ec-082b8de995ee Completed 0 0 2022-01-08 07:27:42 +0000 UTC 29d default backup-exclude-namespaces-db56065f-1b13-45d6-ac4c-28ca8c3ddc09 Completed 0 0 2022-01-08 05:09:32 +0000 UTC 29d default backup-exclude-resources-0bd2e398-e464-445f-93d4-8696eaf26688 Completed 0 0 2022-01-08 07:43:33 +0000 UTC 29d default backup-exclude-resources-72a65f33-26b3-45ec-8f9b-a06549ea20f4 Completed 0 0 2022-01-08 03:53:42 +0000 UTC 29d default backup-exclude-resources-aa066fcd-20a5-4a47-87b9-ce771a96e708 Completed 0 0 2022-01-08 03:40:10 +0000 UTC 29d default backup-exclude-resources-e924f92e-3f0d-460c-8b6e-8e03be480e3c Completed 0 0 2022-01-08 09:03:05 +0000 UTC 29d default backup-f001f6d5-997a-4966-bbdc-af8919975e4f Completed 0 0 2022-01-08 04:01:46 +0000 UTC 29d default backup-include-namespaces-26662f2c-5577-43a3-bb82-2875b3ced336 Completed 0 0 2022-01-08 07:14:15 +0000 UTC 29d default backup-include-namespaces-82f997d7-9be4-4307-b043-4df2a84df71b Completed 0 0 2022-01-08 03:52:19 +0000 UTC 29d default backup-include-resources-56cda735-a41a-4d6a-a585-0a85117d6ee4 Completed 0 0 2022-01-08 09:06:29 +0000 UTC 29d default backup-include-resources-b9f1f601-1f4c-44d7-9c16-9f40e06be2c5 Completed 0 0 2022-01-08 03:58:46 +0000 UTC 29d default backup-label-selector-8e2cf192-32af-4113-8b5f-dd7bfa76127e Completed 0 0 2022-01-08 07:24:23 +0000 UTC 29d default resourcefiltering=true backup-label-selector-c150f1e2-2740-4e35-a5af-ebcf97c6150f Completed 0 0 2022-01-08 03:43:20 +0000 UTC 29d default resourcefiltering=true backup-namespace-annotations441a3c11-0ed3-49e8-b898-1e1d48a8086e Completed 0 0 2022-01-08 09:00:57 +0000 UTC 29d default backup-namespace-annotations8c78e6d5-e281-4a84-9f2d-63a8c5eca2e3 Completed 0 0 2022-01-08 03:34:47 +0000 UTC 29d default backup-rbac56cb9157-2639-49e3-aa1c-c1e0045be61c Completed 0 0 2022-01-08 09:01:57 +0000 UTC 29d default backup-rbac5b3e6af4-9664-4fbd-ac23-3444dd960ac0 Completed 0 0 2022-01-08 03:35:46 +0000 UTC 29d default backup-rockbands-591240ea-e5cd-411d-94b2-1c3bdce31720-0 Completed 0 0 2022-01-08 07:15:48 +0000 UTC 29d default backup-rockbands-591240ea-e5cd-411d-94b2-1c3bdce31720-1 Completed 0 0 2022-01-08 07:16:59 +0000 UTC 29d default backup-rockbands-591240ea-e5cd-411d-94b2-1c3bdce31720-2 Completed 0 0 2022-01-08 07:18:24 +0000 UTC 29d default backup-rockbands-591240ea-e5cd-411d-94b2-1c3bdce31720-3 Completed 0 0 2022-01-08 07:19:35 +0000 UTC 29d default backup-rockbands-591240ea-e5cd-411d-94b2-1c3bdce31720-4 Completed 0 0 2022-01-08 07:21:00 +0000 UTC 29d default backup-rockbands-e14e3083-78ea-422b-8355-1da3d15501b2-0 Completed 0 0 2022-01-08 03:27:34 +0000 UTC 29d default backup-rockbands-e14e3083-78ea-422b-8355-1da3d15501b2-1 Completed 0 0 2022-01-08 03:28:46 +0000 UTC 29d default backup-rockbands-e14e3083-78ea-422b-8355-1da3d15501b2-2 Completed 0 0 2022-01-08 03:30:11 +0000 UTC 29d default backup-rockbands-e14e3083-78ea-422b-8355-1da3d15501b2-3 Completed 0 0 2022-01-08 03:31:23 +0000 UTC 29d default backup-rockbands-e14e3083-78ea-422b-8355-1da3d15501b2-4 Completed 0 0 2022-01-08 03:32:47 +0000 UTC 29d default

reasonerjt commented 2 years ago

I don't quite understand why this error happens, let's see if it can be reproduced.

blackpiglet commented 2 years ago

Then I ran two more times. Both of them have some unexpected result.

  1. [Snapshot] case failed log: https://gist.github.com/blackpiglet/bcca463c014a574055d5fe4f83b04d34
  2. All cases were ran two times before suit ended. [Basic] and [Snapshot] case failed in the second run: https://gist.github.com/blackpiglet/ef83259fd2d09cc9429ee5fa2010cef3