Open chrisstjohn opened 1 year ago
What values did you pass with index
array?
What values did you pass with index array?
0x3 -> { 0, 1 } 0x5 -> { 0, 2 } 0x6 -> { 1, 2 }
an alternative fix for me is to change p[i] = 1
to p[index[i]] = 1
since I think it's trying to make a row of identity rather than copy it from the encoding matrix
That is a good find and fix!
I cannot explain where did 0x8f, 0x8e
rows come from in decoder_2_3_0x5
and decoder_2_3_0x6
matrices. These rows should be copied from the encoder matrix, but instead the look like some garbage values.
Those rows originate from the encoder matrix which is then inverted so they're not garbage - decoder_2_3_0x5
at least really is the inverse of {{ 1, 0 } { 3, 2 }}
I'm still puzzled how on earth this could work for anybody else but since I'm not using zfec for its intended purpose - just borrowing some of its math - I can't comment whether or not it does work for other people.
@WojciechMigda do you want me to submit a PR? Looks like there are a ton of unit tests that would check this for us?
One possibility is that the code that normally uses this function re-orders the shards to replace data with parity.
e.g. Encode A, B -> X, Y, Z lose X Decode Z, Y -> A, B
whereas I'm trying to decode Y, Z -> A, B
Those rows originate from the encoder matrix which is then inverted so they're not garbage - decoder_2_3_0x5 at least really is the inverse of {{ 1, 0 } { 3, 2 }}
Oh, I thought these are values right after the loop.
I'm still puzzled how on earth this could work for anybody else but since I'm not using zfec for its intended purpose - just borrowing some of its math - I can't comment whether or not it does work for other people.
There are some tests and they pass. In particular the Haskell ones use QuickCheck
to construct input data which is then subject to roundtrip encode/decode, and they pass too.
do you want me to submit a PR? Looks like there are a ton of unit tests that would check this for us?
I am not a maintainer of this project, I just happen to be on the subject for the past few weeks. As for the existing tests it would be worth checking if they adequately cover this scenario.
One possibility is that the code that normally uses this function re-orders the shards to replace data with parity.
I don't see anything like that in zfec
--- fec_decode
calls build_decode_matrix_into_space
right away. On the other hand, original Luigi Rizzo's code does have an additional step: function called shuffle
is called before build_decode_matrix
. We'd need to confirm that it does the reordering you've mentioned.
@chrisstjohn maybe you already figured this out by yourself, but anyway I looked closely into this and these are my conclusions:
fec_decode
requires that block indices passed with index
array are ordered to satisfy this assertion:(index[i] >= k) || (index[i] == i)
(https://github.com/tahoe-lafs/zfec/blob/master/zfec/fec.c#L524)
If the above does not hold then fec_decode
will abort execution. [1]
fec
did reorder indices within fec_decode
(index
contents was mutable), but in zfec
the same reordering was relocated to python API wrapper: https://github.com/tahoe-lafs/zfec/blob/master/zfec/_fecmodule.c#L454 .fec_decode
works on being in order which satisfies that index[i] == i
it becomes obvious that p[i] = 1
is equivalent to p[index[i]] = 1
.[1] Unless assertions are disabled, but this is not a default setup.
Thanks @WojciechMigda for your analysis and thoughts.
My feeling is that an ordering requirement buried deep in a 'C' library that is implemented way outside in the Python wrapper is a rather brittle design. As you say, right now there is an assertion in fec_decode
but I can't see the harm of
assert(index[i] < code->n)
p[i]
to p[index[i]]
As I agree that right now, in this case, with this Python wrapper, index[i] == i
Even in terms of performance there probably wouldn't be an extra indirection as index[i]
is already fetched and available to the compiler.
I'm interested in pre-calculating some decode matrices so I used some of the "under the hood" functions of fec.c
With a trivial example of k=2, n=3 I get the following:
there are three erase cases identified by a bitmap
but note that the 0x5 and 0x6 decode matrices are identical - which is wrong.
I think I traced the problem to here
build_decode_matrix_into_space()
which does some weird stuff whilst trying to decimate the encoding matrix:I removed the apparent "optimisation" and it all springs into life:
Now this is really really really old code and pretty central the the whole decoder; am I missing something or is it broken?