tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
352 stars 86 forks source link

Wrong decoding with edge-condition parameters #13

Open SupraSummus opened 6 years ago

SupraSummus commented 6 years ago

I think it's a bug, but I'm not 100% sure, because I'm playing with RS codec for just few hours.

>>> import reedsolo
>>> c = reedsolo.RSCodec(nsym=0, nsize=1)
>>> c.encode(b'aaa')
bytearray(b'aaa')
>>> c.decode(b'aaa')
bytearray(b'')
SupraSummus commented 6 years ago

problem occurs also in @lrq3000's fork (rev 7583ed)

lrq3000 commented 5 years ago

@SupraSummus Thank you for the detailed feedback, yes the codec is optimized for speed so it might not have all the checks to ensure that user's input is correct (actually it is assumed that the user should do the input sanitization). I will see if adding a length check is bothersome for performance or not :-) Or at least add something in the docs about this edge case. Keep in mind that RSCodec.encode() is meant to be used in a loop, so the less input checks we do, the faster it is!

SupraSummus commented 5 years ago

I didn't mean proposing to add a parameter sanity check. The arguments I provided seem valid to me - they describe RS codec with single-byte chunk and no error correction symbols. In my comprehension such codec should behave as identity for decoding and encoding operations.

... or I missed domain assertions for RS coding and nsyms=0, nsize=1 is not a valid RS actually.

Thank you for attention.

lrq3000 commented 4 years ago

@SupraSummus Thank you very much, I see what you mean. With the latest release, here is what we get now:

>>> c.decode(b'aaa')
(bytearray(b''), bytearray(b'aaa'), bytearray(b''))

Which means that the decoded message, which should just result in identity, is not considered as the message but as ecc symbols. This indeed looks weird, I should check inside the code, but I will probably have to dig back to the maths to understand what's happening here, but it's certainly very counterintuitive.

lrq3000 commented 4 years ago

Ok I think I understand what's happening here: if nsym is 0, the algo as implemented here will still try to get at least one ecc symbol, hence the input message is considered as the symbols and the message is considered empty, whereas you would expect the opposite.

I'll see how to best fix this issue, whether in the code or if we put a warning.

(TODO for me: add a unit test for this case)