tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
351 stars 86 forks source link

Inconsistent reporting of error positions when using erasures #41

Closed jbosboom closed 1 year ago

jbosboom commented 2 years ago

The third element of the rs_correct_msg is a list of error positions. The positions returned are inconsistent. Consider the following:

>>> import reedsolo as rs
>>> from reedsolo import rs_encode_msg, rs_correct_msg, rs_check
>>> _ = rs.init_tables()
>>> msg = rs_encode_msg(bytes(range(10)), nsym=4)
>>> msg
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\xf0\x9f\x84\xea')
>>> rs_correct_msg(msg, nsym=4, erase_pos=[1])  # CALL 1
(bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [1])
>>> rs_correct_msg(msg, nsym=4, erase_pos=[0])  # CALL 2
(bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [])
>>> msg[0] = 0xFF
>>> rs_correct_msg(msg, nsym=4)  # CALL 3
(bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [0])
>>> rs_correct_msg(msg, nsym=4, erase_pos=[0])  # CALL 4
(bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [])

In call 1, the message is correct, but we mark position 1 as an erasure. The returned bytearrays match the input message, but rs_correct_msg reports position 1 as an error. This is defensible, as that byte was logically erased, even though the correct byte happened to be in the array. But then call 2 should report position 0 as an error.

We then corrupt byte 0 of the message. Call 3 corrects the corruption and reports position 0 as an error (which it is). In call 4 we mark position 0 as an erasure; this call also corrects the corruption, but does not report position 0 as an error.

I think the source of this inconsistency is reedsolo.py lines 728-729, which write 0 to each erased position. In call 1, this corrupts the message, which is subsequently corrected and the error position correctly returned; in call 2, the erased position is already zero, so the message remains correct and rs_correct_msg returns on line 780 with an empty error position list. In call 3, no zeroing happens because no erasures were given, so the message is corrected and the error position correctly returned; in call 4, the zeroing corrects the message and the call returns on line 780 with an empty error position list.

If you agree that all erased positions should be reported as error positions, and you want to keep the zeroing for ease of debugging (according to the comment), the easy fix is to return error_pos on line 780 (in place of the empty list).

lrq3000 commented 2 years ago

Oh, thank you very much for your detailed report! Yes I agree we should aim at consistency, i will have a look at implementing your fix (or please make a PR so you are correctly credited in the commits history!). I can take care of the unit testing.

Le jeu. 30 déc. 2021 à 11:00, Jeffrey Bosboom @.***> a écrit :

The third element of the rs_correct_msg is a list of error positions. The positions returned are inconsistent. Consider the following:

import reedsolo as rs from reedsolo import rs_encode_msg, rs_correct_msg, rscheck = rs.init_tables() msg = rs_encode_msg(bytes(range(10)), nsym=4) msg bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\xf0\x9f\x84\xea') rs_correct_msg(msg, nsym=4, erase_pos=[1]) # CALL 1 (bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [1]) rs_correct_msg(msg, nsym=4, erase_pos=[0]) # CALL 2 (bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), []) msg[0] = 0xFF rs_correct_msg(msg, nsym=4) # CALL 3 (bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [0]) rs_correct_msg(msg, nsym=4, erase_pos=[0]) # CALL 4 (bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), bytearray(b'\xf0\x9f\x84\xea'), [])

In call 1, the message is correct, but we mark position 1 as an erasure. The returned bytearrays match the input message, but rs_correct_msg reports position 1 as an error. This is defensible, as that byte was logically erased, even though the correct byte happened to be in the array. But then call 2 should also report position 0 as an error.

We then corrupt byte 0 of the message. Call 3 corrects the corruption and reports position 0 as an error (which it is). In call 4 we mark position 0 as an erasure; this call also corrects the corruption, but does not report position 0 as an error.

I think the source of this inconsistency is reedsolo.py lines 728-729 https://github.com/tomerfiliba/reedsolomon/blob/cf146880d197bba15300ce8934afcf069c05d157/reedsolo.py#L724, which write 0 to each erased position. In call 1, this corrupts the message, which is subsequently corrected and the error position correctly returned; in call 2, the erased position is already zero, so the message remains correct and rs_correct_msg returns on line 780 with an empty error position list. In call 3, no zeroing happens because no erasures were given, so the message is corrected and the error position correctly returned; in call 4, the zeroing corrects the message and the call returns on line 780 with an empty error position list.

If you agree that all erased positions should be reported as error positions, and you want to keep the zeroing for ease of debugging (according to the comment), the easy fix is to return error_pos on line 780 (in place of the empty list).

— Reply to this email directly, view it on GitHub https://github.com/tomerfiliba/reedsolomon/issues/41, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIRFXQ3NFGHSHB2YUEBC2DUTQUU7ANCNFSM5K7N5F5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>