Open valaphee opened 10 months ago
I also combined the tests, to check all variants directly in correctness, especially for different widths, as they all use the same SIMD implementation.
I would postpone the other implementations to a later PR.
Sorry for the churn, but I just merged the new Implementation
API (https://github.com/mrhooray/crc-rs/pull/115). There shouldn't bee too many conflicts with this PR, but I can help rebase it if you'd like.
Ah nice, no problem, I'll take a look today, with the feedback given
I thought about renaming Simd to Clmul (carry-less multiplication variant), or should I stay with Simd?
And I would recommend against using Simd as default, as the problem with Simd is, that it's not possible at the moment, to do the crc calculation in a const fn.
Should Simd/Clmul always be available even if the platform/target-features doesn't support the required features and then just be handled as a type alias for the default impl?
I thought about renaming Simd to Clmul (carry-less multiplication variant), or should I stay with Simd?
I prefer Simd.
And I would recommend against using Simd as default, as the problem with Simd is, that it's not possible at the moment, to do the crc calculation in a const fn.
Ah, good point. Then, I think not touching the default behavior is fine.
Should Simd/Clmul always be available even if the platform/target-features doesn't support the required features and then just be handled as a type alias for the default impl?
I think we should gate the Simd impl behind the platform/target-features and leave the default as is.
Once runtime detection is added, this can be revisited.
Yep, Simd is probably easier to understand.
Open for future PRs would be:
can all be done in a non-breaking way.
The x86 implementation is based on Intel's paper about "Fast CRC Computation for Generic Polynomials using PCLMULQDQ Instruction"
It's about 4 times faster then Slice16, it's implementable for all algorithms, and theoretically requires no table (the remaining bytes could use nolookup). I know that crc32fast exists, but its not configurable, manually calculating the constants is annoying, and this implementation is about 1GB/s faster (crc32fast uses unaligned memory access)
SIMD will only be used when
Crc<Simd<W>>
is used, and supported by the target-features specified when compiling.TODO