k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.22k stars 2.36k forks source link

S3 Snapshot Retention policy is too aggressive #9866

Open maggie44 opened 8 months ago

maggie44 commented 8 months ago

Describe the bug: A PR had been introduced that prunes S3 etcd backups by date:

The default etcd-snapshot-retention is 5.

Since this PR, only 5 total snapshots are allowed in S3 buckets, rather than 5 per control plane. This means that for my cluster with three control planes, I only have one backup for one of them, and the backups for the others are only from today (total of 5). As I add more control planes, presumably some are not going to be backed up at all as they will be pruned immediately in favour of the latest by date (one more control plane would push out that top fsn1 backup completely):

Screenshot 2024-04-04 at 17 16 40

If my understanding here is correct, this is a significant breaking change in the patch release that has reduced the number of backups available for people's clusters, potentially leading to nasty surprises when they go back to check them.

A potential solution is to change the etcd-snapshot-retention to number-of-control-planes*5 manually but that is a lot of custom configuration and I'm not sure this need is understood/documented.

A similar issue is reported here: https://github.com/rancher/rke2/issues/5216

Expected behavior: 5 backups per control-plane.

Perhaps the default of 5 should be raised to at least cover the common scenario of three control planes; or S3 should be etcd-snapshot-retention x number of control planes. Or perhaps restoring the original behaviour and implementing a different process for cleaning up orphaned snapshots with age > n. Ideally some way to communicate the impact this is having on existing clusters too.

brandond commented 8 months ago

This is discussed in some detail at https://github.com/rancher/rke2/issues/5216#issuecomment-1896796970, which I see you already linked. As mentioned there, the current behavior is intentional; we will likely eventually add a separate control over the number of s3 snapshots.

Multiplying the retention count by the number of etcd nodes is also an interesting idea, but this could lead to overly aggressive pruning following a restore when the number of etcd nodes is temporarily reduced.

There is also another related issue:

github-actions[bot] commented 6 months ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

HoustonDad commented 6 months ago

Not stale

github-actions[bot] commented 4 months ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

horihel commented 4 months ago

hello friendly bot - s3 backup retention policies are still broken, no direct workaround available. Let's hope everyone has soft-delete features in their s3-buckets to make up for it.

horihel commented 4 months ago

that said, does anyone know how to make an 'undeleted' backup known to rancher so it can be restored?

github-actions[bot] commented 3 months ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

HoustonDad commented 3 months ago

not stale

github-actions[bot] commented 1 month ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

bwenrich commented 1 month ago

Not stale.

Q: for the use case with multiple control planes, would it be helpful if each cluster used a different path/prefix in the S3 bucket?

That might avoid the issue in the linked RKE issue (about data from deleted nodes never getting pruned), but also prevent clusters from pruning data belonging to each other.

If this cannot be done, I would wonder if it's even safe to have multiple clusters sharing the same S3 bucket, considering that "misconfiguring" the pruning on one of them could have a wide impact to all of their snapshots.

brandond commented 1 month ago

No, all the nodes should use the same bucket and prefix. It is assumed (and we should probably document this as a requirement) that all nodes use the same bucket and prefix. If this does not happen nodes will remove S3 snapshot records created by other nodes, thinking that they are "missing" from s3 - since they are all expected to have the same view of what exists on S3.

The snapshots on S3 do not "belong" to the node that uploaded them, they are accessible to and can be restored/listed/deleted/pruned by any node. Having them owned by a node would not make a lot of sense, given that S3 is supposed to be a stable external system available to any cluster member, even one that was just created from scratch that perhaps needs to restore an old snapshot in order to restore the cluster from complete loss of all nodes.

horihel commented 1 month ago

would it make more sense then to disable the built-in retention altogether and instead use retention policies built into the s3 backend? Is k3s/rke2 able to handle that?