Open Michal-Leszczynski opened 1 year ago
We need to explain what is a broken restore, and how when can identify it.
@tzach I updated issue description so that it actually tells the whole story.
Issue that could benefit from this.
It is actually really important we finish this documentation so people know what to do.
When encountering an error in the middle stages of running restore, the cluster might be left in a incorrect state. E.g.
tombstone_gc
mode is set todisabled
, restored views are dropped, there are some files left in theupload
directory. If the same restore task is then continued, it should handle resume from incorrect state just fine, but if someone wants to start a brand new restore task or abort restore for good and have correct state of the cluster previous to the restore, there are some actions that needs to be taken and they differ for SM 3.1 and 3.2.Maybe the "rollback" procedure should be automatized? We could add an additional flag
sctool restore --rollback
, so that in case of unexpected error, user can run:and expect that cluster is in a good state. This flag would require us to formalize what exactly should be reverted (e.g. should we truncate restored tables or just leave them as they are?)