pjkundert / ezpwd-reed-solomon

Reed-Solomon & BCH encoding and decoding, in C++, Javascript & Python
https://hardconsulting.com/products/13-reed-solomon
Other
99 stars 21 forks source link

Major Bugs during Decoding adds More error to data #7

Closed JohnMcJohn closed 7 years ago

JohnMcJohn commented 7 years ago

Initially thank-you for making your library freely available. I've been looking into using it, however there seems to be major bugs in your library.

The following simple example sets up a 2 error detecting 1 error correcting code. However when the decode is run, it actually adds more errors to the payload and will also falsely return a value giving the impression that the decoding was successful.

#include <string>
#include <iostream>

#include "ezpwd/rs"

int main()
{
     ezpwd::RS<255,253> rs;
     std::string orig(253,' ');
     std::string copy = orig;
     rs.encode( copy );
     copy[3] = 'x';
     copy[4] = 'x';
     int count = rs.decode( copy );
     copy.resize( copy.size() - rs.nroots() );
     if ( copy != orig ) {
         std::cout << "Failed to restore origin data." << std::endl;
     }
     return 0;
}

Clearly the errors can't be corrected, but shouldn't it at least be able to detect that there are errors?

and why does the decode add more errors to the string. Initially there were two but then it adds an extra one - should it not make the data worse? as in either keep it the same or fix the actual errors but not make it worse.

We are considering looking into the library, but simple scenarios like mentioned above raise some red flags. Another issue why is the code layout so unusual, weird uncommon c++ indentations, unnecessary layouts make it very hard to read the code understand it and debug it.

pjkundert commented 7 years ago

Thanks for the feedback!

An RS(255,253) codeword contains (up to) 253 data symbols, and 2 parity symbols. Every 2 Reed-Solomon parity symbols can locate and correct 1 error. Every 1 R-S parity symbol can correct one erasure; a symbol whose location is known, but the value is unknown. Therefore, this Reed-Solomon encoding is capable of detecting and correcting up 1 error, which is unknown in both location and value.

In your example, you add 2 errors. This is beyond the capacity of the R-S error detection and recovery.

Unfortunately, any Reed-Solomon codeword with more than the maximum capacity of errors will often be "corrected" to a nearby (but completely incorrect) R-S codeword. If you run 'rsvalidate', you can see an examples of this. In this test we create various R-S codecs of varying error detection/correction capacity. Then, we create and encode valid R-S codewords (data+parity), and we add various numbers of errors and erasures to this codeword, and attempt to decode:

$ ./rsvalidate
parity-(era+2*err)  decoded  successes  failures (-'ve ==> error load > parity capability)
           -19        0          0         7
           -18        0          0         4
           -17        0          0        14
           -16        0          0         6
           -15        0          0        21
           -14        0          0        11
           -13        0          0        15
           -12        1          0        45
           -11        0          0        66
           -10        1          0        42
            -9        0          0       117
            -8        3          0        58
            -7        1          0       139
            -6        2          0        75
            -5        0          0       137
            -4        4          0       118
            -3        2          0       181
            -2        7          0       192
            -1       35          1       537
             0     1588       1588         0
             1     1762       1762         0
             2      629        629         0
             3      301        301         0
             4      138        138         0
             5      184        184         0
             6      100        100         0
             7      110        110         0
             8       84         84         0
             9      115        115         0
            10       78         78         0
            11       71         71         0
            12       51         51         0
            13       71         71         0
            14       12         12         0
            15       16         16         0
            16        7          7         0
            17       14         14         0
            18        6          6         0
            19       12         12         0
            20        9          9         0
            21        3          3         0`

Up to and including the rated error/erasure capacity -- R-S is able to recover and correct the original codeword with 100% accuracy. No observed errors, ever, in billions of applications of the algorithm, over the last 15 years.

However, as soon as you exceed the error/erasure capacity (see the rows with -'ve numbers), you observe something very interesting. Quite often, you will see a number other than 0 in the "decoded" column. This means that the R-S algorithm "successfully" decoded and corrected the codeword! At -1 (this means that error/erasure load exceeded parity correction capacity by 1 symbol; we added 1 too many erasures to the codeword), we incorrectly "successfully" decode 35/(537+35) codewords, or about 6.1%! In fact, in 1/(537+35) cases, we actually successfully recovered the original R-S codeword, even though we exceeded the parity capacity. Beyond that (-2 and less), we see varying numbers of incorrect "successful" recoveries, but never a case of successfully recovering the original data. The R-S decode thinks it is successful, but returns an erroneous codeword. Almost always containing new (incorrectly "fixed") errors!

In your case, you are adding 2 errors, requiring (at minimum) an RS( 255,251) codeword containing 4 parity symbols to recover successfully. Therefore, your recovery is guaranteed to fail, and (in fact), if it ever reports success, will return an even more corrupt codeword than you gave it! Exactly as you report.

As for the strange indentation: make certain your editor has tabs set to indent to 8 columns. Then, it will make sense. Many people (incorrectly, in my opinion) set their tab stops to 4 columns, or something else. This will always corrupt classically indented code. Instead, leave your tab stops at 8, and set your editors indentation level to whatever (say, 4). Then, your editor will insert appropriate Tab and Space symbols to achieve your desired indentation, but will display classically indented code correctly, too...

Cheers!