Open vladislav-curvetech opened 4 months ago
Did you find the issue?
Check for any pod restarts.. these metrics IIUC are incremental as new backup/restores are processed. Velero does not list all existing backups prior to its startup to count attempt/failure totals.
Thank you, Kaovilai, for your participation. Yes, the pod sometimes restarts for an unknown reason. Before the restart, I see just one warning:
level=warning msg="active indexes ....blabla.....12b-c1] deletion watermark 2024-08-10 20:30:46 +0000 UTC" logModule=kopia/kopia/format logSource="pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" sublevel=error.
Did I understand correctly that if the pod restarts, the metrics important to me are reset?
yes
One reason is, velero sync backup from object storage (could be from a different cluster) to cluster.
Many of those will have status of completed.
If metrics count completed backups in cluster, it would overcount what this cluster has actually completed.
I am experiencing issues with Velero where most of the metrics are always zero, and basic Prometheus metrics are not functioning correctly. This issue significantly affects our ability to monitor the backup status and reliability.
A few problematic metrics:
These metrics are crucial for us to monitor the health and status of our backup operations, but they consistently report zero values, which is not accurate.
Expected Behavior: The above metrics should provide accurate and non-zero values reflecting the actual state of Velero backups.
Environment: Velero version: 1.13.0 Kubernetes version: 1.28 Cloud provider: AWS EKS
Additional Context: Any insights or solutions to this issue would be greatly appreciated as these metrics are critical for our backup monitoring and alerting.
Thank you for your assistance!