Open gnzlbg opened 6 years ago
Would you like to try implementing that feature? You are welcomed to contribute. I will advise you on how to implement them correctly.
I don't have hardware to test ARM and PPC (only qemu), and don't have much experience on this front, but I can give it a shot.
Okay, then please tentatively write those functions with intrinsics. You can approximate the error in ULP by reinterpreting a floating point value to an integer value, and calculate the difference between correct and approximate values.
All architectures currently target by the library have instructions to perform a single NR-iteration of a reciprocal square root (and in some also an exact computation):
The Intel C Compiler (icc) provides many reciprocal square-root intrinsics with different levels of precision (in bits):
The SVML library specifies these here (https://software.intel.com/en-us/ipp-dev-reference-invsqrt):
Clang does not provide most of these, and does not implement the
invsqrt
intirnsics.These intrinsics are tricky to implement efficiently and correctly, yet have extensive hardware support, and are very useful (e.g. to normalize vectors). I think it would make sense to provide an API for reciprocal square roots with different levels of precision, just like SVML does.