Read-only pool import on multiple hosts

mmatuska commented 2 years ago

Describe the feature would like to see added to OpenZFS

Ability to mount OpenZFS pools in read-only mode on multiple hosts.

How will this feature improve OpenZFS?

It would be possible to use OpenZFS as a shared read-only FS in SAN environments.

Additional context

I guess this would require a "full" read-only mode where all writes to any vdevs are off - loss of self-healing etc.

gmelikov commented 2 years ago

It should work now (although I didn't see anybody's tests on pool corruption in this case), interesting experiment on same thing https://wiki.lustre.org/images/0/08/LUG2021-Efficient_Metadata_Scanning_ZFS-Miller.pdf

NicholasRush commented 1 year ago

It would be great to create an interface here that also allows writing to the pool via a TCP/IP or RDMA connection. It's like it's mounted locally, but the file locks on the node makes the pool really physically mounted.

As an an example:

2-node HA cluster with 12 HDD SAS JBOD.

The pool can technically be used physically by both nodes. Since ZFS is not a cluster file system, this is not possible. Therefore, the pool is always only physically mounted via SAS by the node to which the disks are assigned. The second node could also read and write to the pool as if the zpool were directly mounted here, since file locking processes and locks are always controlled by the node that physically holds the pool. If the pool physically falls over to Node-2, Node-1 can later only mount over the network.

So Active-Active HA would be fully possible.It would be great to create an interface here that also allows writing to the pool via a TCP/IP or RDMA connection. It's like it's mounted locally, but the file locks on the node makes the pool really physically mounted.

As an an example:

2-node HA cluster with 12 HDD SAS JBOD.

The pool can technically be used physically by both nodes. Since ZFS is not a cluster file system, this is not possible. Therefore, the pool is always only physically mounted via SAS by the node to which the disks are assigned. The second node could also read and write to the pool as if the zpool were directly mounted here, since file locking processes and locks are always controlled by the node that physically holds the pool. If the pool physically falls over to Node-2, Node-1 can later only mount over the network.

So Active-Active HA would be fully possible.

ikozhukhov commented 1 year ago

you have transaction model and you couldn't realise direct writes from 2 different locations. you need every time to do like git rebase before new commits but how to fix merge conflicts? it is possible to load pool in RO mode under about controller and reload it to latest good transaction, but have working on ability to it by timer or by master node request to RO node

NicholasRush commented 1 year ago

you have transaction model and you couldn't realise direct writes from 2 different locations. you need every time to do like git rebase before new commits but how to fix merge conflicts? it is possible to load pool in RO mode under about controller and reload it to latest good transaction, but have working on ability to it by timer or by master node request to RO node

Only the node that writes to the VDEVs does the VDEV I/O and transaction scheduling. The second node, which can only access via TCP/IP or RDMA, can only do this similar to an NFS release on the mounts. It's just that it's possible to do more with xattr etc. than with NFS. It is of course clear that the mount points on both nodes are always in the same place from the user's point of view.

Node 1 mounts the pool via SAS and also does the i/o transaction scheduling.

Node 2 mounts the pool only over TCP/IP or RDMA (NFS similar) and performs all read and write requests over this connection as well. In other words, node 2 performs indirect operations on the pool, which are processed by node 1 and written to the disks.

This means that node 2 uses node 1 like a file server.

openzfs / zfs