vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.71k stars 1.4k forks source link

Add information about failed backups to status of schedule object #8045

Open dennisGrandt opened 3 months ago

dennisGrandt commented 3 months ago

Describe the problem/challenge you have We use ArgoCD to manage Velero schedulers on our k8s clusters.
With the enabled property useOwnerReferencesInBackup: true ArgoCD discovers the backup and dataImport objects.
We've written custom health checks for both objects, to determine the status of the backups.

But ArgoCD will only change the Application state to degraded when an object, what is managed by sourcecode will degrade.
The discovered objects will not have an effect on the application state.

Describe the solution you'd like

The custom resource: schedules.velero.io has only the following status fields:

It would be very useful to have an additional field: lastBackupState, maybe with the following contents:

With a custom health check we can then monitor the health checks of the backups triggered by the scheduler.

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

sseago commented 3 months ago
It would be very useful to have an additional field: lastBackupState, maybe with the following contents:

successful
failed
none
running

Note that if we add this, we'll need the list to match the same status.phase values supported by backups.

dennisGrandt commented 3 months ago

Note that if we add this, we'll need the list to match the same status.phase values supported by backups.

For sure, the status phase must match. The list was just an example.
But that would be very useful.

That would then be the list of phases defined in the custom resource definition for backups?

blackpiglet commented 3 months ago

First, the last backup status can be found by running kubectl CLI. Second, the schedule should be independent of backups, so adding the triggered backup status to its schedule may not be the correct behavior.

dennisGrandt commented 3 months ago

First, the last backup status can be found by running kubectl CLI.

That does not sound like a production ready approach.

Second, the schedule should be independent of backups, so adding the triggered backup status to its schedule may not be the correct behavior.

I am not sure how the schedule object can be independent of the backup objects, if it is used to schedule and configure/customize the backups. Velero is also adding owner references to the backup objects, pointing to the schedule object.

There is already the field: lastBackup in the status of the schedule object. So velero is adding information when the last backup was created, why not adding the status of that backup?

blackpiglet commented 3 months ago

By saying the schedule is independent of backup, I mean that currently the schedule is used to trigger creating backup periodically, the schedule can be used to trigger anything periodically.

The LastBackup in the Schedule is the last backup triggering timestamp. Not exactly the triggered backup information.