Open ordian opened 9 months ago
Yes the supplied cargo run --release --example quick-comparison
benchmark says encoding runs twice as fast. It's only 20% or 30% faster for decoding, but that's less important for us, given the systematic chunks optimization.
Linux laptop with i7 11th Gen @ 2.80GHz
4096:4096 (1 kiB)
> reed-solomon-16 0 48841 104440
> reed-solomon-novelpoly 0 106629 163039
32768:32768 (1 kiB)
> reed-solomon-16 0 498662 1054801
> reed-solomon-novelpoly 0 1143171 1498802
Apple M2:
4096:4096 (1 kiB)
> reed-solomon-16 0 36535 80496
> reed-solomon-novelpoly 0 57448 110243
32768:32768 (1 kiB)
> reed-solomon-16 0 376412 816519
> reed-solomon-novelpoly 0 614643 1031925
It does single log-exp table multiplications like we do, so the table sizes should be the same. We should not have any advantage in cache pressue under diverse load.
All our abstractions exist because we didn't know how best to represent the field. There is much less abstraction in their code, maybe they'd more confidence in this log-exp table approach, so maybe we confuse rustc somehow, or maybe they just translated leopard more faifthfully, but..
I suspect https://github.com/malaire/reed-solomon-16/blob/master/src/engine/engine_nosimd.rs has some optimization not present in our FFT, which perhaps better reuses something. At least one or two things in our FFT slightly botyhered me, but I never explored it.
It's maybe worth a benchmark vs leopard https://github.com/catid/leopard
I just came across https://github.com/malaire/reed-solomon-16 which was written 3 years ago. It doesn't have any unsafe or SIMD code and seems faster on laptop than reed-solomon-novelpoly. Futhermore, there's an open PR with avx2 backend, which claims to make it >10x faster. It was published in https://crates.io/crates/reed-solomon-simd.
cc @alindima