rust-rse / reed-solomon-erasure

[Looking for new owners/maintainers, see #88] Rust implementation of Reed-Solomon erasure coding
MIT License
174 stars 61 forks source link

[Question] Implementing Parity Volume Set Specification 2.0 #47

Open Ironlenny opened 5 years ago

Ironlenny commented 5 years ago

I'm writing an implementation of Parity Volume Set Specification 2.0. I'm a Rust novice, but it seems to me that your library as is won't work for me. If that's true, I'd like to extent it, but I'm not sure how to go about doing that. It seems based on this that I'd need to add inputs for a constant and an exponent. Am I right? Do you have any other suggestions?

darrenldl commented 5 years ago

it seems to me that your library as is won't work for me.

PAR2 requires a GF16 (Galois field 2^8) engine while the published versions of this library (3.X.X) currently only provides GF8, so the published versions would not fit your need indeed.

There is some work put into implementing GF16 (Galois field 2^16) by @rphmeier, @drskalman and @burdges, which has been merged into the master branch. You can see issue #33 for more context and our discussion. The relevant PRs are #40 and #43.

I have been quite busy for past few months so I haven't managed to keep track of it and forgot where we were mostly, so I'll need to review the code to see if it's a complete implementation yet. Or perhaps they could answer that for you if they happen to be free.

It seems based on this that I'd need to add inputs for a constant and an exponent. Am I right?

Sorry I can't really answer that right now. I'm not familiar with the specifics of PAR2 and I don't have time to investigate as other project's deadline is closing in.

Do you have any other suggestions?

PAR2 is a very complex standard from what I know, and is a massive undertaking to implement. AFAIK it's not very commonly used as well nowadays, so rewriting such a massive, established and presumably well tested project in Rust requires a really good reason.

Mind if I ask what's the context and reason of implementing PAR2 in Rust?

Ironlenny commented 5 years ago

PAR2 is actually a rather simple spec. It defines a series of packets. What the type and size of the contents of each packet are. And what the algorithms are to be used in the creation of said packets. The only packet that's giving me trouble is the recovery packet as it requires the Reed-Solomon algorithm. And I'm still learning that.

As for why I'm doing this. I want to learn more about Rust, and parity as I'm writing a file system in the future.

I have no trouble pulling from master if that's the only code that will work. I've already forked another library, and am using it in my project.

darrenldl commented 5 years ago

PAR2 is actually a rather simple spec.

Whoops, you're right, I was way wrong. I guess I was thinking of another specification.

As for why I'm doing this. I want to learn more about Rust, and parity as I'm writing a file system in the future.

I see. There are some projects using this library should you want any very concrete examples, hbbft is one, but it's used for distribution type of deal rather than local data error recovery iirc. For local data error recovery, I have made an archiver which does that, but otherwise I'm not sure if there are more Rust examples.

I have no trouble pulling from master if that's the only code that will work. I've already forked another library, and am using it in my project.

Yep forking is a good idea - ideally master doesn't break anything, but as we resume development later I can't guarantee that.


I took a brief look at the specification, and can see roughly where it is going. Seems to have poor spatial locality during encoding and decoding, but is likely the only design that can accommodate this type of error recovery.

rphmeier commented 5 years ago

The GF(2^16) implementation @drskalman and I did is working, although unoptimized. We are using it for a fault-tolerant data replication service as part of a larger project.

burdges commented 5 years ago

We're quite interested in optimizations to the GF(2^16) arithmetic. I'm not convinced the sum of logs optimization is being used quite right, although I've not hard the time to dig into it myself. I'm afraid that's kinda subtle topic though.

Ironlenny commented 5 years ago

At the moment I'm just looking how to fit this to purpose. This is the formula outlined by the spec as I understand it: recovery = (inputA * constantA ^ exponent) + (inputB * constantB ^ exponent). Where recovery is the recovery shard, and inputA && B are input shards, constantA && B are 2^n where 'n' has an order of 65535, and exponent is the power of 2 assigned to the recovery shard. All operations are done with Galois Fields.

I don't know how to fit this in with your library. I don't really understand what this algo is doing.

darrenldl commented 5 years ago

I don't really understand what this algo is doing.

Sorry did you mean the algorithm of this library or the algorithm of PAR2?

Ironlenny commented 5 years ago

The algorithm for PAR2.

darrenldl commented 5 years ago

Right. I think I know where it is going very roughly after reviewing the specs as well as Plank's paper, but I don't have time to fully investigate. Understanding the algorithm being PAR2 is part of your project anyway.

I did have a look at par2cmdline's source code, and judging by how Reed Solomon coding is used there, this library may be usable with some adjustment, but not too certain. Specifically I don't know if the picked static parameters used at several places match up.

This library is compatible with BackBlaze's and also Klaus Post's implementation, but I have no idea if that means it is incompatible with PAR2 as a result. Actually hold on...they are using only Galois 8...right I still don't quite understand how the list of constants come into play in PAR2 specs, so I still don't know this library can be adjusted to suit the use for PAR2.

darrenldl commented 5 years ago

There is some interesting talk on GF16 performance and PAR stuff over at https://github.com/Parchive/par2cmdline/issues/130.

Most of the performance talk flew over my head cause I'm pretty clueless on very low-level performance stuff and also not well versed in the accelerated Galois paper, but reckon you guys probably have better luck extracting info from it.

k3d3 commented 2 years ago

Now that galois_16 is a thing, does that make it any easier to use this to implement PAR2?