Open schakrava opened 8 years ago
Any timescales for a fix? Many large replications and replications over slow links will fail, so I wouldn't describe as a "corner" case. New reps should detect previous failures and ideally continue from where it left off, or else delete partial snapshot on receiver, and try again from scratch.
Noting for context that the current btrfs send/receive mechanism, upon which Rockstor's replication feature depends, does not support a resume function, referencing @digint the main author of btrbk in the following closed issue in that project: https://github.com/digint/btrbk/issues/94
"I'm afraid this is not possible with btrfs send/receive (which btrbk relies on). btrfs-progs does not provide a mechanism for resuming partial send/receives, and I have not seen any tools with this functionality."
only option for a "resumable" process I've found is realized in buttersync (not to be mixed up with buttersink) which uses the workaround of creating a temp file using btrfs functionality and then rsync
to allow for a resumable remote transfer ... maybe this could be considered for initial transfers that are very large. This was all done in a script, so no hidden magic there.
@Hooverdan96 I believe that a resumable btrfs sync function is planned for the next btrfs sync protocol upate. We can look to that in time. Not keen on making what is already fairly complex, yet more complex. So I'd like to keep our current implementation which basically wraps basic btrfs sync. In time we are to present alternative multi-host options such as GlusterFS. But I say we keep the existing wrapper just that.
More info in: http://forum.rockstor.com/t/replication-doesnt-handle-failures/926