pjkundert / ezpwd-reed-solomon

Reed-Solomon & BCH encoding and decoding, in C++, Javascript & Python
https://hardconsulting.com/products/13-reed-solomon
Other
99 stars 21 forks source link

The c++ example throws exception with 4095 size #11

Closed Hi-Angel closed 6 years ago

Hi-Angel commented 6 years ago

An example from README stops working once capacity increased from 255 to 4095.

Steps to reproduce:

  1. type in test.cpp:
    
    #include <ezpwd/rs>

int main() { // Reed Solomon codec w/ 4095 symbols, up to 4091 data, 4 parity symbols ezpwd::RS<4095,4091> rs; std::vector data;

// ... fill data with up to 4091 bytes ...
data.resize(4091);

rs.encode( data ); // Adds 4 Reed-Solomon parity symbols (4095-4091 == 4)

// ... later, after data is possibly corrupted ...
rs.decode( data ); // Correct errors

}

2. Compile, and execute:

```bash
$ g++ test2.cpp -o a -g3 -O0 -Wall -Wextra -Wsign-conversion -std=c++17 -pedantic -fsanitize=address -isystem/home/constantine/Projects/ezpwd-reed-solomon/c++/
$ ./a
terminate called after throwing an instance of 'std::runtime_error'
  what():  reed-solomon: output data type too small to contain symbols
[1]    3730 abort (core dumped)  ./a
pjkundert commented 6 years ago

Hi, Konstantin;

The problem you're running into is because of the Reed-Solomon "Symbol" size (in bits). An RS<4095...> codeword consists of symbols that are 12 bits in size. To contain these symbols, ezpwd RS requires vectors of uint16_t data, not uint8_t. Give that a try.

Hi-Angel commented 6 years ago

To contain these symbols, ezpwd RS requires vectors of uint16_t data, not uint8_t

Thanks! I tried making use of ezpwd, but now this requirement made me try exploring other libraries as well. To my surprise, I often encounter these magic numbers 255,256. This made me wondering: whether ezpwd's lack of API to just encode an arbitrary-sized buffer being a limitation of the library or the Reed-Solomon codes per se?

IOW, can't the library encode a 4095-bytes (and why not 4096 anyway?) buffer as uint8_t because it's how the library designed, or because it's how does Reed-Solomon codes work?

Hi-Angel commented 6 years ago

IOW, can't the library encode a 4095-bytes (and why not 4096 anyway?) buffer as uint8_t because it's how the library designed, or because it's how does Reed-Solomon codes work?

FTR, I asked at work, and turned out it's specific to Reed-Solomon codes per se. The reason for specific numbers like 256 and 4096 being something about irreducible polynomials over a Galua field for e.g. 2⁸=256, because whole byte-sequence from POV of Reed-Solomon being a 256 or 4096 coefficients of the polynomial. I am not sure, but probably e.g. 2⁹ is not an irreducible polynomial.

The reason for 12 bit elements I didn't really understand, but it does exist.

And the reason for the off-by-one, i.e. 255 instead of 256, and 4095 instead of 4096 is because polynomial calculation for the element at zero-th position doesn't have same effect as for the others.