openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.44k stars 1.73k forks source link

Online / real-time / synchronous pool replication #11726

Open akoscomp opened 3 years ago

akoscomp commented 3 years ago

Describe the feature would like to see added to OpenZFS

Goal of this feature request is to be able to replicate a ZFS source pool to a remote target pool, through a network link. Target pool would be a 1:1 copy of the source pool (source and target pool would contain the same data).

Replication would be real-time / synchronous : each time a transition is written to the source pool, the same transaction is written to the destination pool, acknowledgement is then given to application once both side have written. We could have a semi-synchronous replication : acknowledgement is given to application once remote side has received the transaction. And we could have asynchronous replication. The most interesting mode is synchronous (and semi-synchronous to save some latency ?), as asynchronous replication can approximatively be simulated using ZFS send/receive.

Of course if target pool disconnects, writes can still be performed on source pool without impacting the application, and once target pool is available again, replication re-starts where it stopped.

And what about maintenance operations on target pool, such as scrubbing, disk resilvering... ? sounds like target pool should have to be accessible (on target server) to be able to perform such operations.

How will this feature improve OpenZFS?

What problem does this feature solve? With this feature users can make a redundant storage, with minimal data loss, in case of primary storage was destroyed. With actual workaround: make snapshot, send and receive it in a loop, need very much resources.

Additional context

Orginial bug: https://www.illumos.org/issues/7166

ahrens commented 3 years ago

@avg-I worked on something like this a while back.

avg-I commented 3 years ago

Unfortunately, that work was never completed. I recall several idea from that time. One was to replicate data by intercepting and injecting it at ZIL level. Another was to have different levels of replication guarantees like sending a record and waiting for it to be confirmed by the remote side or just sending without waiting but still getting a confirmation eventually. TXGs with unconfirmed writes would be kept, so that the data is not overwritten locally. Something like an incremental snapshot would be used to an initial sync-up after a disconnect.

bghira commented 3 years ago

and DragonflyBSD has realtime network mirroring (via HAMMER(2))

bghira commented 3 years ago

you could enhance the send/recv infrastructure to have a keepalive mode where snapshots are regularly created and sent on each txg, and the constantly-open connection would reduce much of the overhead of send-recv scripting, behaving like a PUSH request.

problame commented 3 years ago

I also though about re-using the ZIL records for active+passive HA with both sync / async replication. But the problem with using ZIL records is that it's at the objset level. Thus it won't be usable for replicating DSL data ( dataset properties, snapshots, etc).

I guess a proper analysis of the use cases for this feature is necessary.

devZer0 commented 3 years ago

i'd love to see this in zfs, too.

what about replicating to a remote iscsi, nbd, aoe device? or some remote fibre channel physical or virtual scsi disk served by scst/esos ? at least with mirror this should be possible, as you could attach a third disk to a mirror and have 3 way mirroring, where 1 mirrored drive is remote.

i'm out of ideas how this could work with raidz, though, as you cannot combine mirroring and raidz, nor can you use access a raidz "as a whole" to mirror that.

NicholasRush commented 3 years ago

This would be a very nice feature for HA VM Clusters. But this should work in that way you only mirror the Dataset itself or volume or at pool level the zfs tree. like Snapmirror Sychronous.

mirror a complete pool at physical vdev level would be also very interesting, for high availability applications. when it is possible to mirror any disk in a pool at any time, zfs could mirror disk that smart attributes get faulty before the disk will fail and you need to resilver. like syncmirror and plexes

oferchen commented 1 year ago

This would be a really nice feature I would suggest to start from being able to add remote devices then enabling it to pools would become simpler.