Open schakrava opened 9 years ago
Inline Dedup needs tightly integration with filesystem or to implement it's own design, so the brighter future would be to merge opendedup code with existing filesystem design into mainline kernel, otherwise user either 1) build it's own protection layer underneath the dedup FS or 2) using dedup FS on top of raw devices which is on top of existing filesystem, this introduces one more layer of filesystem which could mean some degree of performance penalty.
I'd suggest to integrate duperemove support, https://github.com/markfasheh/duperemove . I'ts out-of-bound dedup, so you can selectively dedupe files/directories you want.
Personally I'm not in favour of inband de-duplication unless it appears proper in btrfs, its a potentially massive performance / complexity hit for dubious gains in "storage is getting cheaper" environment. In the ZFS world it is almost always discouraged and has a massive impact on performance, and system requirements escalate hugely in RAM and CPU. All for a saving on space. Any additional project would be putting a moving target between services and file-system which doesn't seem like a good idea; unless it's actually in the file system (and matured (see raid 5/6)), but it will still be very costly. Non inline options such a duperemove are a different story; although I would shy away from it myself as I am unsure how it plays with btrfs's idea of how many copies it keeps of each piece of data. duperemove is also quite a small project as it goes; predominantly 1 person.
I think there is plenty already going on with kernel and btrfs-progs to rule this out as too risky.
There are dedup plans in the works for btrfs so I say we keep an eye on those and only respond to them when the time seems right; which will probably not be soon.
De-duplication ultimately reduces redundancy which seems like the wrong way to go. Any shared extent will multiply the risk of loss.
Good to have a flow of ideas though; but apparently I'm not a fan of dedup :smile:
It's great to see @kdave here and recommend duperemove. I've experimented with it a while ago, but just did a new install on a Rockstor box here to test with. I've created another issue to track progress of adding dupremove support. https://github.com/rockstor/rockstor-core/issues/686
Well it's nice to have this out-of-band file-level dedup feature, user can switch it on or off to specific shares given that different subvolumes could have different characristics regarding data protection.
It's true de-dup feature heavily relies on underneath data protection and redundancy, so if this feature is introduced to Rockstor someday, better remind user that a dedup-enabled subvolume should sits in a protected pool like raid 1/10/5/6, by doing so we probably could have a distro targeting both purposes: performance-oriented NAS and dedup/ archiving-oriented Backup.
@phillxnet: the in-band dedup usage should be justified, as mentioned, it has high penalty in cpu and memory use. The benefits of immediate space savings should be estimated before it gets turned on. The out-of-band dedup is available today, the duperemove is just a user friendly tooling around the ioctl. It provides bare minimum to use it plus some additional features like scanning and storage of the results in a separate file for the actual deduplication. I find this very convenient and easy to integrate.
I'm not aware of any similar tool that's comparable in terms of features and development activity, and I'm certainly not writing it off just because there's one main developer. You can see contributions from more people, and you can view users giving feedback as part of the development process.
Regarding the redundancy points, that should be addressed on a different layer in general, but I understand the problem where a single error multiplies in the deduped data.
@kdave Thanks. On reflection my post was a little overly discouraging and erred on the naysayer side. And I certainly did not mean to cast aspersions on the development of duperemove; we are after all in the business of a few people being able to make a difference.
That said I am just cautious of increasing complexities in the fundamentals of Rockstor; however @suman is certainly in favour and that’s good enough for me. Also I was unaware of the btrfs-extent-same ioctl maturity so that puts a more positive light on it, at least for me.
I was thinking more of the file systems directive of safe storage; but yes there is definitely the efficient storage side as well, which extent sharing could address. I’m not really in a position to evaluate duperemove alternatives and given @suman has previously chosen to play with duperemove and you have since then put it forward then all would seem to be progressing nicely; bar the naysayers of course :)
Oh and welcome from me too.
@freeurmind Yes the selective nature is quite nice, and yes I agree that a user explanation of the nature of what is happening and it’s various payoffs would be in keeping; the duperemove readonly mode would come in there.
From a user:
Hey,
Really liking this product. Is there any plans to implement deduplication? I was looking at http://www.opendedup.org/ but I don't like how you use it.. Not very user friendly.
But their dedupe technology is really good. Any chance of getting something like that implemented in this product?