vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.8k stars 1.41k forks source link

Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

Open kkavin opened 1 day ago

kkavin commented 1 day ago

What steps did you take and what happened: We have installed Velero version 1.14.0, and while a few backups were successful initially, we started facing issues after a few days. The Velero pod is restarting frequently with the error below and continues to restart.

We tried reinstalling Velero, but the issue persists.

"time="2024-11-30T07:44:23Z" level=warning msg="active indexes [xr0_7_6b451ba1676853b054d654945c4dc313-sa2a25f6e3408dec5-c1 xr8_15_c98e297efe6656d29b1445b9b2c50c77-s35ffedab2d2740df-c1 xr16_23_5593a3523b7ad2f088bd2d63898871b0-s25f2e987014a3db8-c1 xr24_31_b91b484ebb9347da800bdcd999c4a164-s913ff0b4246609a8-c1 xr32_39_d26b900b73417c156a07383d92b1703d-s5d983913d7607ac3-c1 xr40_47_df3ac0f426b61822cf418a1cb4630bc1-s1616139ab5f8080a-c1 xr48_55_30c8353b7fadeb9e186317d253004c69-s648992548c90b115-c1 xr56_63_653de373a82c1b9f62c99c0551ac1b2d-s69142ccec46a6aae-c1 xr64_71_2a1cbeaa64b7c246489f337dd1093fa3-sb3a9c84a1aca4b4d-c1 xr72_79_4f4d20f63413d4c6e7795d115806f5d6-se0daa36cd2506608-c1 xr80_87_fbbe510445b61a131a974009599ce44b-sd768227d3173f00d-c1 xs88_c96080dbfa121038a2a00cdc4ba09b9f-s44dd9fd141cd25ed-c1 xs89_f6d1ce4ec56b77462613739ac17a950a-s81fd250e0c1db6a9-c1 xs90_0864a650dc2c67f4269d3209869637ae-s89980c56a835fe28-c1 xs91 ..... deletion watermark 2024-08-08 03:35:02 +0000 UTC" logModule=kopia/kopia/format logSource="pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" sublevel=error"

backup velero velero.log

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 5 hours ago

level=warning msg="active indexes

These warning logs are expected and are not the cause of the restart.

The errors in the 1st screenshot are errors for the backup, it won't cause Velero server pod to restart.

There is no errors in the attached velero.log either.

Lyndon-Li commented 5 hours ago

We need below extra infos to further troubleshoot for:

  1. Run velero debug to collect the full velero bundle
  2. When the restart happens, run kubectl logs -n velero <velero server pod name> --previous, and collect the output
  3. Before and after the restart happens, run kubectl describe pod -n velero <velero server pod name> and collect the output
  4. Find the failed maintenance job pods under velero namesapce and run kubectl describe pod -n velero <maintenance job pod name> and kubectl logs -n velero <maintenance job pod name>
kkavin commented 56 minutes ago

Hi @Lyndon-Li, Please find the necessary logs below.

bundle-2024-12-02-05-05-34~.zip velero_describe.txt velero_log.log velero_maintenance.log velero_maintenance.txt velero_previous.log