openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

Possibility of implementing RAID mode using erasure codes #558

Open Rudd-O opened 12 years ago

Rudd-O commented 12 years ago

What would be the possilibyt of doing this within ZFS?

http://www.networkcomputing.com/deduplication/229500204?pgno=2

If the data is erasure-coded into N shares distributed across at least H distinct storage units, then the data can be recovered from any K of these units -- therefore only the failure of H-K+1 units can make the data unavailable.

This means, for example, that a storage server using the erasure codes equivalent of RAID1 across twelve disks, can survive ANY SIX disks dying. In conventional RAID1, the loss of one leg (only two disks) is enough to offline the array.

Does ZFS layering get in the way, or is it possible to, at least in principle, implement it?

Rudd-O commented 12 years ago

relevant http://bigasterisk.com/tahoe-playground/

behlendorf commented 12 years ago

Without investigating this too carefully I suspect it would be possible for ZFS to implement this. The various redundancy layouts used by zfs (mirrors, raidz) are very modular. Adding another one for erasure codes or say distributed parity should be do able, it's just a matter of implementing those policies. See module/zfs/vdev_mirror.c and module/zfs/vdev_raidz for the nuts of bolts of how they are implemented.

behlendorf commented 6 years ago

This was accidentally closed. However, I'm going to leave it closed because a version of this kind of functionality is being implemented in #3497.

thegreatgazoo commented 6 years ago

The raidz vdev driver already used a form of Reed Solomon code, see the comments under the license header in https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev_raidz.c

But raidz vdev driver supports only up to triple parity, which isn't a limit of Reed Solomon. The draid vdev driver reuses the raidz parity code, so it can't do anything over triple parity either.

HW-accelerated EC library would be key to performance. There's plenty of userspace libraries, e.g. Intel ISA-L, but I'm not sure if any exists in the kernel.

gmelikov commented 6 years ago

If @behlendorf won't mind, I'll reopen this feature request. As @thegreatgazoo stated, draid don't introduce the main part of this request - more than any 3 drives failure.

DeHackEd commented 6 years ago

Indeed, erasure codes would allow for arbitrary RAID-Zx for any (sane) value of x.

From a practical standpoint though, very wide RAID-Z arrays tend to perform poorly so users are discouraged from making them too large, even without the CPU penalty of generic erasure codes. I'm worried about the practical implications of allowing arbitrary arrays. 30 disks with 6 parity drives is not going to perform well for many workloads.

PrivatePuffin commented 4 years ago

across at least H distinct storage units As long as this isn't implemented, I personally don't see much use case...

And isn't their implementation of erasure-codes one of the reasons (even 1 node) CEPH arrays with erasure code on, are awkwardly slow? Including rebuilds? How would this work out in combination with draid (which is actually focussed on fixing some current performance bottlenecks)?

Maybe it would be worthwhile if someone paints a professional usecase where this feature is a must and describe the consequences of it not currently being an option?

It looks to me like it's sortoff a solution looking for a (very niche) problem...