tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
351 stars 86 forks source link

Binary codes using RSCodec #37

Open hrichharms opened 2 years ago

hrichharms commented 2 years ago

Is it possible to use RSCodec as a binary codec? If so, how do I do that? I've tried limiting the symbol alphabet by setting c_exp to 2 (edit: set c_exp to 1) only to be greeted with this error message: ` File "", line 1, in File "/opt/homebrew/lib/python3.9/site-packages/reedsolo.py", line 869, in init self.gen[nsym] = rs_generator_poly(nsym, fcr=fcr, generator=generator) File "/opt/homebrew/lib/python3.9/site-packages/reedsolo.py", line 484, in rs_generator_poly g = gf_poly_mul(g, [1, gf_pow(generator, i+fcr)]) File "/opt/homebrew/lib/python3.9/site-packages/reedsolo.py", line 331, in gf_pow return gf_exp[(gf_log[x] power) % field_charac] IndexError: bytearray index out of range `

I think someone has attempted to do the same thing here.

lrq3000 commented 2 years ago

I have just tested and indeed this implementation of a RS codec is less universal than I thought, as galois fields of 1 (binary code, because it's 2^1 symbols = binary) and 2 are not working at all. It fails at the initiation of the logarithmic table. But I guess that's not the only area of incompatibility. The same issue also happens with my other more mathematically accurate implementation: https://github.com/lrq3000/unireedsolomon

Both of these implementations are heavily guided by the algorithms provided in the book: "Algebraic codes for data transmission", Blahut, Richard E., 2003

It's important to note that binary RS codecs are often considered different from symbols/burst RS codecs. This codec is of the burst type, as it was intended for on-storage data protection, not communication. Burst type implementations allow to encode bigger chunks of data at once and are hence faster (and also provide slightly different pros/cons in terms of what types of data corruption they can sustain).

I am unfortunately not proficient enough (anymore) in RS codecs implementations to fix this issue, and it may be a hairy one. If someone has any resource to suggest about how to bridge binary RS codecs with burst RS codecs, I'll be very grateful and take a look.

haneefmubarak commented 2 years ago

I found this wiki page (that is in all honesty 99% an advertisement for some academic researchers lmao) that mentions an alleged GitHub repo that has a binary RS code, but neither of the "cited" papers seem to mention a link to it. Still, perhaps the bones of that article may be helpful toward this?

hrichharms commented 2 years ago

ended up writing my own binary reed-solomon code for the project I was working on. will try to implement whatever fix is necessary here

lrq3000 commented 2 years ago

@hrichharms That is awesome! Your contributions would be very welcome! Please feel free to open a PR.

hrichharms commented 2 years ago

Thought I would drop a quick update here. things are getting a bit busy for me so I might take a bit longer than I had expected, but from what I can tell there may be a couple different causes for the lack of generality to binary codes. I intend on doing a pr at some point but if someone wants to take a crack at it before then I'd be happy to send them the binary-specific rs codec I wrote myself as reference

lrq3000 commented 1 year ago

@hrichharms It's been a while but I'm stumbling again at this issue, it seems that the issue is more generally that galois fields smaller than 8 are not working as expected for some reason. I understand you did not have the time to fully solve this hairy issue, but do you have some insights you remember from your past investigation into this issue? You mention that "there may be a couple different causes for the lack of generality to binary codes", could you precise the ones you identified? Thank you very much in advance!