Open mikecroucher opened 4 years ago
erfc in sleef is slow. I didn't think that people would care about erf or gamma.
OK thanks. erf and similar are not as popular as sin,cos and tan but they have their uses Is there anything I could have done better while iterating over the std::vector?
Thanks for a great library btw, I've enjoyed checking it out.
How about using avx2 instead of avx? Sleef is particularly slow if fma is not available. Use the dispatcher if you are not sure. It’s not too slow as people may think.
Oh, you are the author of walkingrandomly.com. Thank you for introducing sleef at your site. 😃
You are very welcome. That article is pretty old -- I should write an updated version.
Using the dispatcher amounts to calling Sleef_erfcd4_u15
right? If so, that doesn't help on my machine.
I've just tested the Intel SVML implementation of erfc and it is faster than the system one and so also faster than sleef.
I'm using C++ via gcc 9.2 on the Windows Subsystem for Linux. My laptop supports AVX. I fill up a vector of x values like this
I time the system erfc like this
and Sleef like this
Compilation is
g++ erfc_test.cpp -o erfc -lsleef -mavx
and I get the following resultsIt seems that SIMD Sleef version of erfc is slower than the system one.
I haven't done much SIMD programming and I am guessing that I am not loading and storing efficiently but I am not sure what to do about this. Can anyone help me out please?