shibatch / sleef

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
https://sleef.org
Boost Software License 1.0
633 stars 128 forks source link

Reciprocal square-root algorithms #209

Open gnzlbg opened 6 years ago

gnzlbg commented 6 years ago

All architectures currently target by the library have instructions to perform a single NR-iteration of a reciprocal square root (and in some also an exact computation):

The Intel C Compiler (icc) provides many reciprocal square-root intrinsics with different levels of precision (in bits):

The SVML library specifies these here (https://software.intel.com/en-us/ipp-dev-reference-invsqrt):

Clang does not provide most of these, and does not implement the invsqrt intirnsics.

These intrinsics are tricky to implement efficiently and correctly, yet have extensive hardware support, and are very useful (e.g. to normalize vectors). I think it would make sense to provide an API for reciprocal square roots with different levels of precision, just like SVML does.

shibatch commented 6 years ago

Would you like to try implementing that feature? You are welcomed to contribute. I will advise you on how to implement them correctly.

gnzlbg commented 6 years ago

I don't have hardware to test ARM and PPC (only qemu), and don't have much experience on this front, but I can give it a shot.

shibatch commented 6 years ago

Okay, then please tentatively write those functions with intrinsics. You can approximate the error in ULP by reinterpreting a floating point value to an integer value, and calculate the difference between correct and approximate values.