Fail to decode with erasures but no exception/error given

henla464 commented 3 years ago

Hi, I like this library a lot however I encountered a strange behaviour that seems like a bug to me. When doing a decode with erasures I get the same corrupted message back without any error given. This happens even when the check function detects errors. My guess is that the one false erasure somehow messes it up but I read that it should be safe to supply false erasures (they just use up one RSCode)

Lets look at the code:

import unireedsolomon as rs
OrigMsg =      bytearray(bytes([0x83, 0x1F, 0x00, 0x00, 0xFF, 0x00, 0X93, 0xA8, 0x00]))
CorruptedMsg = bytearray(bytes([0x83, 0x10, 0x00, 0x01, 0xFF, 0x01, 0X93, 0xA8, 0x00]))
coder = rs.RSCoder(24, 20)
messageDataAndRSCodeWithNullBytes = coder.encode(OrigMsg, return_string=False)
# Add the RSCodes from OrigMsg to the corrupted message
corruptedMsgWithRSCodes = CorruptedMsg + bytearray(messageDataAndRSCodeWithNullBytes[-4:])
paddedCorruptedMsgWithRSCodes = bytearray(bytes([0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00])) + corruptedMsgWithRSCodes
print("PaddedCorruptedMsgWithRSCodes: " + str(paddedCorruptedMsgWithRSCodes))
try:
    coder.decode(paddedCorruptedMsgWithRSCodes)
except:
    print('As expected decoding fails due to too many errors (4 RSCodes, and 3 errors)')

if not coder.check(paddedCorruptedMsgWithRSCodes):
    print('As expected check fails')

try:
    decodedMsgAndCodeTuple = coder.decode(paddedCorruptedMsgWithRSCodes, erasures_pos=[11,12])
    decodedMsgAndCode = bytearray((decodedMsgAndCodeTuple[0] + decodedMsgAndCodeTuple[1]).encode('latin-1'))
    print('Decoding didnt throw error when we point to one correct erasure (position 12) and one wrong erasure (position 11) [why?]')
    if decodedMsgAndCode == corruptedMsgWithRSCodes:
        print('The message was not decoded, the corrupted message is returned without anything fixed, and no error given [why?]')

    if not coder.check(bytearray(bytes([0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00])) + decodedMsgAndCode):
        print('Check detects the error of course (since it same message that we checked before)')
except:
    print('decoding failed, this is what I expected')

The prints of this is:

PaddedCorruptedMsgWithRSCodes: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x83\x10\x00\x01\xff\x01\x93\xa8\x006\x91\xbay')
As expected decoding fails due to too many errors (4 RSCodes, and 3 errors)
As expected check fails
Decoding didnt throw error when we point to one correct erasure (position 12) and one wrong erasure (position 11) [why?]
The message was not decoded, the corrupted message is returned without anything fixed, and no error given [why?]
Check detects the error of course (since it same message that we checked before)

Am I wrong in assuming this is a bug? Any idea why it behaves this way?

lrq3000 commented 3 years ago

Hello,

Thank you for the feedback :-) I'm glad you find the library useful!

It's normal the decoding fails. When providing erasures positions, you need to provide ALL erasures positions, and not have any mistake in the erasures locations you provide. Essentially, when you provide erasures locations, the decoder will create a syndrome based on it, so it's like the decoder entirely trusts your input. If there is even just one erasure location wrong (ie, you added one more or one less), then the syndrome will be messed up and the decoder will fail. But as you noticed, you can always check the output, then the decoder calculates the syndrome by itself and it will see that the decoding didn't work (as long as you're under the Singleton bound...).

If you are unsure about some erasures locations, you can always do a mix of erasures and errors decoding, you just input the erasures locations you know and leave out the rest, the decoder will try to find the errors by itself (but of course the decoding power is divided by 2, errors are twice more ressources consuming in terms of ecc symbols compared to erasures).

Another possibility is to first try to provide all the erasures locations you suspect, then check the syndrome if the decoding failed, as you did here, and then if it failed you can try again but without providing any erasure location to force errors-only decoding. This can be a sound failsafe strategy.

I hope this answers your questions. Please feel free to reach out if you have any other question.

lrq3000 commented 3 years ago

PS: please also check out the more mature and faster Reedsolo library, which I also co-authored and maintain.

henla464 commented 3 years ago

Thank you for your reply! I am not sure if I understand you correctly. From what i read on the linked resources I have assumed that if I for example have three errors and four ECC symbols, then it can decode that IF I CORRECTLY supply two erasures. Is that true or are you saying this is not possible?

In my original example I am fully aware that it should not decode correctly. What I am surprised about is that it fails without telling me (since it is possible to tell that what is returned is wrong then I think the library should tell that without me having to do another check call).

I will look at the Readsolo library!

lrq3000 commented 3 years ago

About your first question, yes 4 ecc symbols can decode 2 erasures + 1 error (= 2 + 1*2 = 4 ecc symbols consumed).

About your example, the library does not double check, for efficiency purposes. When you do thousands of calculations per second, doing such a check will significantly slow down the decoding process, so if you don't need it, you don't want to do it. This is left to the end user.

The goal of this library is to provide the codec, but then for all the user input management and network reliability management, it's left to the application using the codec, as there are too many edge cases to cover without significantly impairing the performance. So for some applications it can be OK but others may not want to impair the performances.

The reedsolo lib I linked to implements a higher level interface that tries to manage these issues more transparently while still exposing internal functions for those who do not want to compromise with speed at the expense of less safety, although I'm not sure if I implemented the double check at decoding there either.

lrq3000 commented 3 years ago

PS: I just checked and reedsolo isn't implementing an automatic check at the end of decoding. This is left to the user to do if necessary.

lrq3000 commented 3 years ago

I will leave this issue open in case I get an idea to maybe implement it more gracefully, there may be a point in the decoding algos where the syndrome is already calculated so we may be able to check for decoding error for free. Maybe but not sure, because I tried to add as many checks at each stage as possible.

henla464 commented 3 years ago

Thank you, it is no problem for me to do the check. It was just not what I expected so I was worried there was an edge case bug or that I misunderstood something. A bit lucky that I found it now so I can add the check. Maybe it should be added to the documentation?

lrq3000 commented 3 years ago

Yes great idea, thank you very much for your suggestion. I now added a "Edge cases" section in the readme on both projects, please have a look, I explain the main limitations when decoding. It's non exhaustive as there are other limitations (that I forgot at the moment lol), but IIRC the two points I cover are the two main limitations of reed-solomon codecs, if you look out for those you should be fine.

lrq3000 / unireedsolomon

Fail to decode with erasures but no exception/error given #16