tomerfiliba-org / reedsolomon

⏳🛡 Pythonic universal errors-and-erasures Reed-Solomon codec to protect your data from errors and bitrot. Includes a future-proof zero-dependencies pure-python implementation 🔮 and an optional speed-optimized Cython/C extension 🚀
http://pypi.python.org/pypi/reedsolo
Other
352 stars 86 forks source link

Array of bit/boolean values #15

Open maberer opened 5 years ago

maberer commented 5 years ago

Hi, thank you for the library. I am not sure if it is possible but I am currently trying to encode a single value of 20 bits, so I am trying to feed the encoder with a boolean array of length 20. e.g. [true, true, false, true, false ....]

After encoding, 60 bits should be produced (in total) I would like to be able to correct 5 errors. Size of the alphabet q = 16 blocklength n = 15 Message length k = 5

I am not sure what parameters I have to use - of if I am using them correctly. I have a legacy Java implementation, but the produced data differs.... Can somebody help me to understand the correct use of the library for the parameters above...?

maberer commented 5 years ago

I used the following parameters

rs = RSCodec(nsym=10, nsize=15, prim=19, generator=2, c_exp=4)

Then I tried to encode the number 10:

result = rs.encode([10, 0, 0, 0, 0])

result is: bytearray(b'\n\x00\x00\x00\x00\n\x07\x07\x01\x0e\x08\x06\x08\x0b\x02')

I then converted the result into bits. I expected it to return exactly 60 bits but is a lot more...

Is there an error in my setup?

lrq3000 commented 5 years ago

Hello, it's normal, you have a message of 15 characters and each character is 2^4=16 bits, so in total you should get an encoded message of 240 bits with the encoder you parametrized. I am not sure how you want to achieve 60 bits only, as even just the original input message is 5 characters, which equals to 16*5 = 80 bits. Could you maybe clarify this point?

lrq3000 commented 5 years ago

The number of total bits should be a multiple of 2: so the input message cannot be 20 bits, it can only be 32 bits, but you can fill in just 20 bits and pad the rest with 0s. Same for the encoded message, it can only be 64 bits (because you can't change how the RS field will generate the ecc characters, it will use the whole space available), or it can be 32bits and you fill the rest with 0s to get up to 60bits (which is a loss of space).

So to get the closest encoder that would fit your need in practice (32bits input message, 64bits encoded message), it would be:

rs = RSCodec(nsym=2, nsize=4, prim=19, generator=2, c_exp=4)
result = rs.encode([10, 10])

And the result should be a 4 characters bytearray. This would allow to correct 2 erasures or 1 error.

maberer commented 5 years ago

Thank you for your reply, I am trying to clarify the situation:

The basic idea behind the current implementation is, that the 20 data bits are separated into 5 hexadecimal numbers. Each hexadecimal number can be represented with 4 bits. The encoder then produces 60 bits, that can be represented by 15 hexadecimal numbers (15 * 4 bits = 60 bits).

The closest algorithm, that mimics the behavior of my legacy Java application outputs the following data: (first column is index, 2nd column is the encoded data, the last 5 hexadecimal numbers are the data input [10, 0, 0 ,0, 0])


i            data[i]
0            3
1            7
2            1 
3            7
4           14
5            5
6            1
7            8
8            15
9            14
10           10
11            0
12            0
13            0
14            0

The file that produces this output is from here: http://www.eccpage.com/rs.c This is a C implementation that uses the following parameters:

#define mm  4          /* RS code over GF(2**4) - change to suit */
#define nn  15          /* nn=2**mm -1   length of codeword */
#define tt  5             /* number of errors that can be corrected */
#define kk  5            /* kk = nn-2*tt  */
int pp [mm+1] = { 1, 1, 0, 0, 1} ; /* specify irreducible polynomial coeffts */ 

I would like to use the Python implementation as a reference implementation, although I am unable to produce the same output with it....

I am still not sure, why the Python implementation differs here, but i think that it would need to represent the input data differently (only the lower for bits in this case)?

Is there any way to get the same result with this implementation?