klauspost / reedsolomon

Reed-Solomon Erasure Coding in Go
MIT License
1.87k stars 246 forks source link

Remove dead code in fwht #275

Closed AndersTrier closed 6 months ago

AndersTrier commented 7 months ago

When reworking my Rust implementation of leopard16, I realized that the last loop in fwht is unreachable.

On the last iteration of the outermost loop, dist4 == order, after which we update the value of dist to dist4, making us unable to enter the body of the last loop.

order is always either 256 or 16384, and

In [2]: 4<<2<<2<<2
Out[2]: 256
In [5]: 4<<2<<2<<2<<2<<2<<2
Out[5]: 16384

On a related note: I tried to optimize fwht further by using 8 values at a time (fwht_8). It was slightly faster on x86, but slower on ARM. fwht_16 was slower on both architectures.

klauspost commented 7 months ago

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

klauspost commented 7 months ago

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

AndersTrier commented 7 months ago

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

Thanks!

I would have to test wrt. compatibility, as I don't think high rate and low rate are compatible, and sometimes you can choose either.

AndersTrier commented 7 months ago

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

Did you consider implementing it with SIMD?

I don't know if it's worth it, but I found some code to get you started if you want to give it a try ;) https://github.com/catid/leopard/blob/master/docs/vector_fwht_4.txt https://github.com/paritytech/reed-solomon-novelpoly/blob/df906e1ca27dc6c0c1b663f1653f57d4620f03dd/reed-solomon-novelpoly/src/field/inc_log_mul.rs#L118 https://stackoverflow.com/a/54133143

klauspost commented 6 months ago

Yeah, it could be done... It could actually be rather effective AFAICT. But it's been too long since I looked at this to judge if/when it would provide a speedup.