Implement Replication failure detection and recovery for a btrfs corner case

rockstor / rockstor-core

Linux/BTRFS based Network Attached Storage(NAS)

http://rockstor.com/docs/contribute_section.html

GNU General Public License v3.0

555 stars 137 forks source link

Implement Replication failure detection and recovery for a btrfs corner case #1106

Open schakrava opened 8 years ago

schakrava commented 8 years ago

More info in: http://forum.rockstor.com/t/replication-doesnt-handle-failures/926

holmesb commented 8 years ago

Any timescales for a fix? Many large replications and replications over slow links will fail, so I wouldn't describe as a "corner" case. New reps should detect previous failures and ideally continue from where it left off, or else delete partial snapshot on receiver, and try again from scratch.

phillxnet commented 7 years ago

Noting for context that the current btrfs send/receive mechanism, upon which Rockstor's replication feature depends, does not support a resume function, referencing @digint the main author of btrbk in the following closed issue in that project: https://github.com/digint/btrbk/issues/94

"I'm afraid this is not possible with btrfs send/receive (which btrbk relies on). btrfs-progs does not provide a mechanism for resuming partial send/receives, and I have not seen any tools with this functionality."

Hooverdan96 commented 10 months ago

only option for a "resumable" process I've found is realized in buttersync (not to be mixed up with buttersink) which uses the workaround of creating a temp file using btrfs functionality and then rsync to allow for a resumable remote transfer ... maybe this could be considered for initial transfers that are very large. This was all done in a script, so no hidden magic there.

phillxnet commented 10 months ago

@Hooverdan96 I believe that a resumable btrfs sync function is planned for the next btrfs sync protocol upate. We can look to that in time. Not keen on making what is already fairly complex, yet more complex. So I'd like to keep our current implementation which basically wraps basic btrfs sync. In time we are to present alternative multi-host options such as GlusterFS. But I say we keep the existing wrapper just that.