intel / isa-l

Intelligent Storage Acceleration Library
Other
951 stars 299 forks source link

Erasure code - question about the code chosen #142

Open jeffareid opened 4 years ago

jeffareid commented 4 years ago

This is not about a problem with erasure code, but a question about why "original view" RS Vandermonde matrix was chosen for erasure code as opposed to "BCH view" code (which should be faster if instruction like pshufb is not available). I've run out of sources to ask about this, so I'm posting this here, hoping someone may be able to answer.

Wiki article explains the two type of codes, both called Reed Solomon:

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Constructions

If a BCH view code was chosen instead, with generator polynomial of the form (x-1)(x-2)(...), then a single erasure (or any one of the erasures) can be corrected via XOR instead of a matrix multiply, and this can also be used for encoding. Using a RS(20,17) code as an example, with c columns of data per row, then encoding is done with a 2 by 17 matrix multiply of the rows followed by a 1 by 19 XOR of the rows. For decoding of either data and/or parities, 3 sets of 3 syndromes are generated, inverted, then 2 of the rows of the inverted matrix are used for a 2 by 20 matrix multiply, followed by a 1 by 19 XOR.

For Vandermonde matrix correction of RS(20,17) code, a 17 by 17 matrix is inverted (could be done via a 1140 entry lookup table since comb(20,17 = 1140), and for d erasures in the data, a d by 17 matrix multiply is done, and then for p erasures in the parities, the erased parities are regenerated using the now recovered data via a p by 17 matrix multiply.

The XOR would have been significantly faster prior to instructions like pshufb, but pshufb runs at close to memory bandwidth, and XOR is only about 4.5% faster on my old Intel 3770K processor.

gbtucker commented 4 years ago

The erasure coding in ISA-L is for block-type erasure codes found in storage systems and not generally for forward error correction as your examples. With this, parity blocks can be calculated with any encoding matrix and not limited to a Vandermonde-type matrix. There are sometimes trade-offs for choosing the encoding matrix and in other cases it doesn't matter much. You could choose a matrix with few 1-bits in coefficients so that it can be implemented with lots of xors instead. There are some limitations with this approach and doesn't scale well. Note in a typical Vandermode matrix with 1 coefficients in the first column, you can find the first erasure with just xors. This is one of its benefits vs. other types. Also in block erasure codes, matrix inversion is done infrequently and not a large portion of the calculation time.

jeffareid commented 4 years ago

I meant using BCH as an erasure only code, not for error correction. Assuming the first factor of the generator polynomial is (x-1), then XOR can be used to encode one of the rows and also correct one of the erasures. Tape drives group data into matrices in order to perform RS correction across rows and down columns, but the column correction is erasure only, utilizing a hardware matrix multiply of each columns syndromes to correct the data. Firmware generates the matrix, based on rows tagged by the row level ECC. There is the ability to detect an error during column erasure correction, which is handled by adding the row to the list of erasures and starting the erasure correction over again.

What I don't get is why base erasure code on original view (Vandermonde) RS code, when BCH view implementations were already in place? There were some early cases of high rate erasure codes (number of parity symbols == number of data symbols, so codeword was twice as long as message), but this was pretty much restricted to just one type of situation (I'm thinking one specific case of a satellite protocol). It's mentioned in the Wiki article, but without reference and it wasn't common. I'm not aware of any media based erasure code, such as something similar to Raid 6, where Vandermonde type code was used. This seems to have started with jerasure, but I don't know why.

jeffareid commented 4 years ago

I forgot to mention that I have example BCH type erasure code here. My processor (Intel 3770K, Ivy Bridge) doesn't have AVX512, but does have SSSE3, so I'm limited to using xmm registers.

https://github.com/jeffareid/ersbch

jeffareid commented 4 years ago

Note in a typical Vandermonde matrix with 1 coefficients in the first column, you can find the first erasure with just xors. This is one of its benefits vs. other types.

Isn't the Vandermonde matrix modified so that it is a "systematic" encoding matrix, where the data rows are not modified and only the parity rows are generated?

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Systematic_encoding_procedure:_The_message_as_an_initial_sequence_of_values

There is an example of this here:

https://www.backblaze.com/blog/reed-solomon

https://github.com/Backblaze/JavaReedSolomon

If I switch the encoding matrix to be one based on BCH view code, then none of the encoding matrix columns are all 1's, but since one of the syndromes is XOR, then XOR can used to encode one of the data rows (essentially encoding by correcting that row via XOR), and to correct one of the erasures via XOR.

I'll have to experiment with the Vandermonde matrix to see if XOR can be used to correct one of several erasures of data rows.

gbtucker commented 4 years ago

If you choose an encode matrix like so for example ec(6,4):

encode matrix:
  0-   1  0  0  0
  1-   0  1  0  0
  2-   0  0  1  0
  3-   0  0  0  1
  4-   1  1  1  1
  5-   1  2  4  8
  6-   1  4 10 40

To recover any 1 fragment in range {0,4} from sources in {0,4}, the decode matrix looks like:

decode matrix:
  0-   1  1  1  1
  1-   0  0  0  0
  2-   0  0  0  0
  3-   0  0  0  0

True for any (k+1,k) in {0,k}.

jeffareid commented 4 years ago

First thanks for explaining this to me, but I'm missing something here:

If you choose an encode matrix like so for example ec(7,4): decode matrix: 0- 1 1 1 1 1- 0 0 0 0 2- 0 0 0 0 3- 0 0 0 0

There are 4 data rows and 3 parity rows. Which 4 of the 7 rows are being XOR'ed to do the decode?

This is what I would be using for ec[7][4] (in hex with leading zeroes):

01 00 00 00
00 01 00 00
00 00 01 00
00 00 00 01
8e 63 1b 07
b0 ba 22 0e
3f d8 38 08

This is a BCH view encode using generator polynomial (x-1)(x-2)(x-4) = x^3 + 07 x^2 + 0e x + 08 (corresponds to the bottom right corner of ec).

Let the data matrix be a [4][8] matrix with increasing values. The encoded [7][8] matrix will be:

00 01 02 03 04 05 06 07 
08 09 0a 0b 0c 0d 0e 0f 
10 11 12 13 14 15 16 17 
18 19 1a 1b 1c 1d 1e 1f 
da 2b 25 d4 39 c8 c6 37 
33 15 7f 59 ab 8d e7 c1 
e9 3e 5a 8d 92 45 21 f6 

Xor of all 7 rows results in

00 00 00 00 00 00 00 00

So a single erasure can be corrected via XOR, and encode can be done by encoding all but one row of parities, and using XOR to generate the last row of parity (essentially doing a single erasure correction to encode that final parity row).

The current decoding process can be used with this encode matrix. For erased data rows, using selected rows of an inverted 4 by 4 sub-matrix of the encode matrix to recover the erased data rows, then regenerating erased parity rows using a sub-matrix of the encode matrix. If no parity rows are erased, then XOR can be used to recover one of the erased data rows. If there are erased parity rows, after data is recovered, then XOR can be used to recover one of the parity rows.

With "BCH view" encoding, there is an alternate decode process without separate handling for data and parity rows. For e erasures in data and/or parity rows, an e by e "locator" matrix based on the erasure locations is generated, inverted, and used to multiply a syndrome generation sub-matrix. The syndrome generator matrix is:

01 01 01 01 01 01 01 
40 20 10 08 04 02 01 
cd 74 1d 40 10 04 01 

Example: rows 1, 3, 5 are erased. Generate syndrome sub-matrix with columns 1, 3, 5 deleted:

01 01 01 01 
40 10 04 01 
cd 1d 10 01 

The locator matrix is:

01 01 01 
20 08 02 
74 40 04 

It's inverse is:

1a 49 22 
d5 ad aa 
ce e4 88 

Correction matrix = (inverse locator matrix) · (syndrome sub-matrix):

b2 43 39 71 
be a4 29 d2 
0d e6 11 a2 

Remove the last row of the correction matrix:

b2 43 39 71 
be a4 29 d2 

Multiply by the 4 x 8 non-erased rows of the data matrix to recover 2 rows of data, then xor to recover last row.

gbtucker commented 4 years ago

First thanks for explaining this to me, but I'm missing something here:

If you choose an encode matrix like so for example ec(7,4):
decode matrix:
0- 1 1 1 1
1- 0 0 0 0
2- 0 0 0 0
3- 0 0 0 0

There are 4 data rows and 3 parity rows. Which 4 of the 7 rows are being XOR'ed to do the decode?

Sorry, I was just trying to follow the format in the example link you sent. I cut out the rest of the decode matrix. Being more explicit and assuming 4th row is in erasure.

[decode matrix] * [rows 1,2,3,p1n] = orig source 1,2,3,4
 1  0  0  0         a   b   c   d
 0  1  0  0     *   e   f   g   h
 0  0  1  0         i   j   k   l
 1  1  1  1        p11 p12 p13 p14
jeffareid commented 4 years ago

decode matrix

Thanks for the explanation.

encode matrix

Your suggested encode matrix is the same as Raid 6, the parity rows are BCH view syndromes

01 00 00 00 
00 01 00 00 
00 00 01 00 
00 00 00 01 
01 01 01 01    raid 6 P
01 02 04 08    raid 6 Q
01 04 10 40    raid 6 R (triple parity raid 6)

Syndrome (Raid 6) encoding doesn't produce RS codewords, it's rows of data follows by rows of syndromes. This encoding allows up to 255 data rows + parity rows, while BCH encoding produces RS codewords, so the sum of the number of data rows and parity rows is limited to a total of 255 rows (assuming GF(2^8)).

In what seems to be a common jerasure case of 17 data rows and 3 parity rows (P, Q, R), for a single erasure of any data row or the P row, the decoding uses 17 row XOR, while BCH view uses 19 XOR. BCH view can also use 19 XOR for a single erasure of Q or R. Let c = # columns per row. For a 3 data row erasure case, syndrome decoding creates and inverts I[17][17], then sub-matrix I[3][3] · D[3][c] to decode. Raid 6 decoding creates and inverts I[3][3], creates sub-encoding S[3][3], then X[3][3] = I[3][3] · S[3][3], then X[3][3] · D[3][c] to decode. For RS(20,17), both are about the same. For 255 rows of data and 3 rows of parity, the inversion time of a [255][255] matrix (time complexity ~O(n^3)), could be an issue.

I grabbed bits and pieces of old code to create a working syndrome based encoder + decoder, one is C++ only, the other is C++ and assembly (for matrix inversion, matrix multiply, xor of rows). My processor is Intel 3770K, Ivy Bridge, so I'm limited to xmm registers since I don't have AVX512.

I renamed my repository to:

https://github.com/jeffareid/erasureb

jeffareid commented 4 years ago

This leaves me with my original question. Why do some or most jerasure implementations use original view Vandermonde matrix encoding, where row XOR can't be used for either encoding or decoding, and how did this become the apparent semi-standard for jerasure, especially since Raid 6 was well known and already in use?

With parallel table lookup (PTLU) instructions like pshufb, a matrix multiply implementation is close to memory bandwidth, so the XOR isn't much of a factor with PTLU, but not all processors have PTLU, and I wonder if jerasure predates when PTLU instructions became common.

vitalif commented 2 years ago

Hi I came across your issue

where row XOR can't be used for either encoding or decoding

It CAN be used. jerasure always generates matrices where one row is (1,1,1,...,1). I don't know why you'd want to read all (data+code-1) chunks instead of only (data) number of chunks. So I think the answer is obvious - nobody wants to read more chunks than needed :)