riscv / riscv-bitmanip

Working draft of the proposed RISC-V Bitmanipulation extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
204 stars 65 forks source link

byteswap instructions #184

Closed ryao closed 1 year ago

ryao commented 1 year ago

It would be nice to have byteswap instructions. They are simple to implement in hardware and ZFS would benefit from it.

When ZFS imports a pool from a system with opposite endianness, it must byteswap data when verifying checksums. The ZFS code has many versions of checksumming functions, which are native endian vs byteswapped, scalar vs simd vs superscalar, and arm vs Intel vs etcetera. The superscalar one is a scalar implementation of the algorithm that enables us to do SIMD. Intel did a write up explaining the math behind how it works:

https://www.intel.com/content/www/us/en/developer/articles/technical/fast-computation-of-fletcher-checksums.html

When SIMD is not available, the superscalar version is always the fastest, but on architectures without byteswap instructions, performance of the byteswap version of the checksum function can be expected to suffer severely.

We generate so many versions of checksumming functions that I have been experimenting with using high level GNU C vector code to generate them all. The following has LLVM/Ciang compile the high level GNU C vector code for RV64GC:

https://gcc.godbolt.org/z/xdedG1Tre

The first function is the function for calculating partial sums for native endian checksums while the second function is the byteswap version. The native endian checksum’s loop only uses 22 instructions. The byteswap version’s loop uses 70 instructions. It will likely operate around 1/3 of the speed at best. Byteswap instructions would restore most of the speed of the native endian version.

ryao commented 1 year ago

I had missed rev8. Nevermind.