Open GeiserX opened 1 year ago
Hi @DrumSergio,
Do you mean there're backups in "PartiallyFailed" phase will stay in backup list forever without manual interference? Is the following 2 configs helps on your situation? --garbage-collection-frequency is provided since v1.9.
velero backup/schedule --ttl duration How long before the backup can be garbage collected.
velero install --garbage-collection-frequency duration How often the garbage collection runs for expired backups.(default 1h)
If expired backups were not removed by GC as GC runs, we have to dig into this issue first, then we can think about the new treatment on this requirement. Could you confirm my question first?
Hi @danfengliu Thanks for your prompt response.
Yes, we are using TTL options. This is usually respected. But sometimes we have various clusters where we have Completed
backups which are expired (So the TTL has been exceeded, it appears in the EXPIRY column like 10d ago
). When I try to delete them using the Velero CLI, they are sometimes left, so I have to resort to kubectl delete backup ...
And sometimes they are stubborn, still appearing when interacting with the Velero CLI.
And we are using the default value provided in the Velero chart (which I have checked and it's 1h by default). I did not know about this feature. But we routinely have a lot of backups dangling forever because of a PartiallyFailed
, Failed
and similar error messages, which I don't remember now.
If you want me to help you debug this, give me some instructions and let's wait for some time for them to appear again.
Thanks for the detailed feedback! As we know, using "kubectl delete backup" CLI is not a good practice for deleting Velero backups, because that will cause orphan data left in object store. Could you provide Velero version and basic information of plugin, cluster and object store? And t's better to have Velero server pod logs trigged by Velero delete CLI operation which failed to delete some of the target backups. Before we having these information, I will try to produce this issue first.
Hi @DrumSergio, Here are some tips on Velero backups mamangement:
Thanks @danfengliu Perhaps this should be off by default. I'd really like to have this option set up in my cluster. Meanwhile I'll be investigating them more thoroughly to know what's happening.
Describe the problem/challenge you have I have to routinely clear up expired backups that failed to be removed, usually because due to a
PartiallyFailed
state with some ugly code likevelero get backup | grep ago | awk '{ print $1 }' | tr '\n' ' ' | xargs velero delete backup --confirm
Describe the solution you'd like A hard expiry should be configurable through parameters. So, for instance, clear all velero backups after 1 week. If failing, force delete with a
kubectl delete backup ...
.Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.