Compare with reed-solomon-16

Yes the supplied cargo run --release --example quick-comparison benchmark says encoding runs twice as fast. It's only 20% or 30% faster for decoding, but that's less important for us, given the systematic chunks optimization.

Linux laptop with i7 11th Gen @ 2.80GHz

4096:4096 (1 kiB)
> reed-solomon-16                  0         48841        104440
> reed-solomon-novelpoly           0        106629        163039

32768:32768 (1 kiB)
> reed-solomon-16                  0        498662       1054801
> reed-solomon-novelpoly           0       1143171       1498802

Apple M2:

4096:4096 (1 kiB)
> reed-solomon-16                  0         36535         80496
> reed-solomon-novelpoly           0         57448        110243

32768:32768 (1 kiB)
> reed-solomon-16                  0        376412        816519
> reed-solomon-novelpoly           0        614643       1031925

It does single log-exp table multiplications like we do, so the table sizes should be the same. We should not have any advantage in cache pressue under diverse load.

All our abstractions exist because we didn't know how best to represent the field. There is much less abstraction in their code, maybe they'd more confidence in this log-exp table approach, so maybe we confuse rustc somehow, or maybe they just translated leopard more faifthfully, but..

I suspect https://github.com/malaire/reed-solomon-16/blob/master/src/engine/engine_nosimd.rs has some optimization not present in our FFT, which perhaps better reuses something. At least one or two things in our FFT slightly botyhered me, but I never explored it.

It's maybe worth a benchmark vs leopard https://github.com/catid/leopard

paritytech / reed-solomon-novelpoly

Compare with reed-solomon-16 #40