scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
52 stars 34 forks source link

Docs: add restore page with information about reverting broken restore changes #3540

Open Michal-Leszczynski opened 1 year ago

Michal-Leszczynski commented 1 year ago

When encountering an error in the middle stages of running restore, the cluster might be left in a incorrect state. E.g. tombstone_gc mode is set to disabled, restored views are dropped, there are some files left in the upload directory. If the same restore task is then continued, it should handle resume from incorrect state just fine, but if someone wants to start a brand new restore task or abort restore for good and have correct state of the cluster previous to the restore, there are some actions that needs to be taken and they differ for SM 3.1 and 3.2.

Maybe the "rollback" procedure should be automatized? We could add an additional flag sctool restore --rollback, so that in case of unexpected error, user can run:

sctool restore update restore/ID --rollback
sctool start restore/ID

and expect that cluster is in a good state. This flag would require us to formalize what exactly should be reverted (e.g. should we truncate restored tables or just leave them as they are?)

tzach commented 1 year ago

We need to explain what is a broken restore, and how when can identify it.

Michal-Leszczynski commented 1 year ago

@tzach I updated issue description so that it actually tells the whole story.

Michal-Leszczynski commented 1 year ago

Issue that could benefit from this.

pdbossman commented 1 month ago

It is actually really important we finish this documentation so people know what to do.