Open elias-orijtech opened 8 months ago
/cc @klauspost @liamsi @musalbas; continued work on reconstructSome thanks to Elias!
I think this change makes sense as is. But I originally thought that there is a way to only recompute the required shares and I also thought that is what the other implementation was doing. @klauspost can you confirm that the non-leopard implementation operates only on the required shares?
Like it looks like this only operates on required
shards: https://github.com/klauspost/reedsolomon/blob/5b85c725bcb6b785d9a273a267eb698b1f02c262/reedsolomon.go#L1560-L1606
Also, agree with @klauspost that benchmarks/numbers would help understand the impact of this (if any).
Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.
Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.
It's been too long since I looked into leopard myself and I don't fully understand this tbh. Just from a high-level perspective, it should be possible to evaluate a polynomial given enough points (i.e. n+1) in O(n), with n being the degree of the polynomial (without having to fully recompute that polynomial). Are you saying that leopard does work differently/does not allow this and has to do its O(n log n) computations either way?
If you look at the paper, what I mean is that you should be able to save computations by simply not adding missing shard positions to the set E if they are not required:
https://arxiv.org/pdf/1404.3458.pdf
In the implementation this should correspond to these: https://github.com/klauspost/reedsolomon/blob/162f2bad346238488d8652bdf46292cb947b19b7/leopard.go#L423-L425 So a simple check if they are required before setting them could do the job. But it might not be as easy as that as other parts of the code might operate under the assumption that if a position is not in E, then there was no error or erasure. So there might be other changes necessary still.
Sorry. I don't read math. I can't even tell if the paper relates to Leopard.
Reading the code it seems like input is divided into "mips" (probably derived from mip-maps from 3D) that are stacked for the final output. (*errorBitfield).prepare() converts missing shards into the mips that are needed.
This makes sense to give the O(N*log(N))
characteristics of the encoding and means that to decode one shard all mip levels above have to be resolved. The isNeeded will resolve if a given mip level is needed for the reconstruction.
There may very well be additional optimization possible here, but I have no idea where to start looking for it.
I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.
I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.
Yes. Currently BenchmarkDecode1K tests (among others) the time for reconstructing a single shard (leopard-gf16-single
). You can see the numbers for that in this chart - "Recover One" column.
This will probably not have changed those numbers, since the only difference from using ReconstructSome
is that it will ignore some other missing shards. So a similar benchmark that tests ReconstructSome
with some mutations is probably needed. I wouldn't expect it to be faster than "Recover One".
(But do note that 1K is so small the setup time is mostly dominating this particular benchmark)
Let me try if what I wrote above (inspired by the paper that is the basis for leopard) and see if only adding required shards to the error bits, breaks anything and also if it actually has any noticeable performance gains.
Based on #274, I've included the optimization for the error bits. See https://github.com/klauspost/reedsolomon/pull/272/commits/0ae8b020f6185da93505c31ba276614082d156f9 for details. In short:
errLocs
must reflect the erasures, but also doesn't have an effect on runtime.errBits
does seem to allow omission of unwanted shards.ReconstructSome
test.The new tests fails in regular non-leopard configuration with
% go test -run ReconstructSome -v -short
Using pure Go
=== RUN TestReconstructSome
panic: runtime error: slice bounds out of range [:65664] with capacity 0
goroutine 70 [running]:
github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP.func1(0x0?, 0x10080)
/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:979 +0x2a4
created by github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP in goroutine 34
/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:1011 +0x248
exit status 2
FAIL github.com/klauspost/reedsolomon 0.622s
but I've so far failed to find an error in the test (and the leopards succeed). @klauspost can you spot my error?
Another issue with optimizing errBits
is that the leopard8 cache is no longer correct: https://github.com/klauspost/reedsolomon/blob/4e91954739f00400b1c60e81096cb56fa2aea207/leopard8.go#L531 because there is no longer a one-to-one relationship between errBits
and errLocs
.
I've fixed the caching issue by no longer including errBits
, at the cost of calling errBits.prepare
even for cache hits.
Great stuff. I will take a look as soon as I get some extra time.
Kind ping @elias-orijtech @klauspost :-)
CC @odeke-em