paritytech / reed-solomon-novelpoly

Novel polynomial basis for a reed solomon encoder
30 stars 6 forks source link

Compare with reed-solomon-16 #40

Open ordian opened 9 months ago

ordian commented 9 months ago

I just came across https://github.com/malaire/reed-solomon-16 which was written 3 years ago. It doesn't have any unsafe or SIMD code and seems faster on laptop than reed-solomon-novelpoly. Futhermore, there's an open PR with avx2 backend, which claims to make it >10x faster. It was published in https://crates.io/crates/reed-solomon-simd.

cc @alindima

burdges commented 9 months ago

Yes the supplied cargo run --release --example quick-comparison benchmark says encoding runs twice as fast. It's only 20% or 30% faster for decoding, but that's less important for us, given the systematic chunks optimization.

Linux laptop with i7 11th Gen @ 2.80GHz

4096:4096 (1 kiB)
> reed-solomon-16                  0         48841        104440
> reed-solomon-novelpoly           0        106629        163039

32768:32768 (1 kiB)
> reed-solomon-16                  0        498662       1054801
> reed-solomon-novelpoly           0       1143171       1498802

Apple M2:

4096:4096 (1 kiB)
> reed-solomon-16                  0         36535         80496
> reed-solomon-novelpoly           0         57448        110243

32768:32768 (1 kiB)
> reed-solomon-16                  0        376412        816519
> reed-solomon-novelpoly           0        614643       1031925

It does single log-exp table multiplications like we do, so the table sizes should be the same. We should not have any advantage in cache pressue under diverse load.

All our abstractions exist because we didn't know how best to represent the field. There is much less abstraction in their code, maybe they'd more confidence in this log-exp table approach, so maybe we confuse rustc somehow, or maybe they just translated leopard more faifthfully, but..

I suspect https://github.com/malaire/reed-solomon-16/blob/master/src/engine/engine_nosimd.rs has some optimization not present in our FFT, which perhaps better reuses something. At least one or two things in our FFT slightly botyhered me, but I never explored it.

It's maybe worth a benchmark vs leopard https://github.com/catid/leopard