tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
351 stars 86 forks source link

Decoding a message composed of 4-bit words in GF(16) #54

Open suzukieng opened 1 year ago

suzukieng commented 1 year ago

Hi, I'm trying to use this module to verify the so-called "mode message" of Aztec codes (see: https://en.wikipedia.org/wiki/Aztec_Code#Mode_message) Reed Solomon is used on GF(16), using the polynomial x^4 + x + 1 (= 0x13, 10011b, 19 decimal). The message is composed of 7 words, two 4-bit data words followed by five 4-bit ECC words.

The reference Aztec code from the specification has the following mode message: [0, 9, 12, 2, 3, 1, 9] (0000100111000010001100011001 in binary), which I just can't get to decode, and I don't understand why.

My verification code:

>>> from reedsolo import RSCodec, ReedSolomonError
>>> words = [0, 9, 12, 2, 3, 1, 9]
>>> prim = 19  # 10011 = x^4 + x^1 + x^0
>>> c_exp = 4  # 2^4 = GF(16)
>>> nsym = 5 # number of ECC words for compat Aztec codes
>>> rsc = RSCodec(nsym=nsym, c_exp=c_exp, prim=prim)
>>> rsc.decode(words)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../venv/lib/python3.10/site-packages/reedsolo.py", line 929, in decode
    rmes, recc, errata_pos = rs_correct_msg(chunk, nsym, fcr=self.fcr, generator=self.generator, erase_pos=e_pos, only_erasures=only_erasures)
  File ".../venv/lib/python3.10/site-packages/reedsolo.py", line 760, in rs_correct_msg
    raise ReedSolomonError("Could not correct message")
reedsolo.ReedSolomonError: Could not correct message

I also checked another implementation (zxing) and they use the same RS parameters as I do ( https://github.com/zxing-cpp/zxing-cpp/blob/master/core/src/GenericGF.cpp#L33), and there the same message decodes correctly there (0 errors).

Can someone tell me where I'm wrong? I just started digging into Reed-Solomon codes so it might very well be a fundamental misunderstanding on my side. :-)

lrq3000 commented 1 year ago

So I'm not sure what is going wrong, I did not know about Aztec codes before but they don't seem that different to QR codes, only the parameters differ, but the principle is similar.

My guess is that there are hidden parameters that are hardcoded in other codecs for Aztec codes. One clue for example is that all codewords generated by Reed-Solomon are exactly the length of the field, hence here the mode word should be 16 characters long. Since it's not, there must be some padding somewhere, so this is the first reason why it's not working.

Then, there are the usual offenders, such as fcr, which is commonly set to a constant in some countries by convention, but the convention differs in other countries or in other academic labs or engineering labs. So if you're not sure where to look for, I would also suggest to dig to find what is the implicitly set value of fcr, because for sure it's set somewhere, but sometimes it's hidden deep inside the codec algorithms (and sometimes the software developers are not even aware of these assumptions! That's why I made the codec here universal, and it took me a long while!).