Open megazone87 opened 6 years ago
Hello,
Those scalar functions are not optimized. They are provided for easy understanding of how the vectorized version of functions work.
You can try Sleef_sinf1_u35purecfma instead of Sleef_sinf_u35. Those purecfma functions should be faster. These are only included in the git version.
I tried Sleef_*f_u35 with the latest master branch:
Firstly, i found it may need compiler flag: -march=native
, thus __AVX2__
is defined, otherwise a compile error happend:
error: 'Sleef_expf1_u10purecfma' was not declared in this scope
x[j] = exp(x[j])
Secondly, the result is still unsatisfied: the speed is just up a little, and still not faster than
That means that recent math functions in glibc are pretty fast. SLEEF is a vectorized math library, and it is not meant for scalar computation.
You are right, I'm tring to write vectorized code now. But i am a noob for this. Is there any tutorial or example for using sleef? (I do read things like src/libm-tester/tester2simdsp.c
, but still feel it not quiet obvious.)
And, there is another similar(?) library: https://github.com/QuantStack/xsimd, would you introduce differences between sleef and other library? If it exists in README will be helpful for people like me.
Thank you!
Could you tell me a little bit about the purpose of your code?
replace the math.h math function by faster implementation (include vectorized SIMD).
The project is for speech synthesis, there is a lot of sin cos exp log pow ..
Hi - this issue has been quite for a while, without taking any direction. Shall we close it?
Actually I have a plan for this issue, which is to remove sleefdp.c and sleefsp.c, and make the scalar functions aliases to the functions with purecscalar helper.
make the scalar functions aliases to the functions with purecscalar helper.
What problem would this change solve?
I am going to introduce a dispatcher to those functions, and they can utilize FMA if available. Then, scalar functions are as fast as vector functions.
I like the idea of removing the scalar implementations (in src/libm/sleef{s,d}p.c
), for the sake of keeping maintenance costs low. We actually rarely touch these files but issues have been reported in the past where the scalar routines do not always match the vector ones, which lead to the design of the Sleef_<name>1_u<accuracy>purec{,fma}
implementation (vector algo + scalar helper) to ensure reproducibility.
However getting rid of src/libm/sleef{s,d}p.c
is not gonna make it easy to understand algorithms and potentially improve them. Besides people shouldn't use these, because as stated here they are now slower than standard implementations.
I'm wondering if maybe we could simply use the system of helpers to generate human-readable documentation or pseudo-code?
I replace all appearance of
#include <math.h>
with#include "sleef_math.h"
in my project, whilesleef_math.h
looks like:However, i don't see a speed up in my project, actually it degrade a little speed. Is my usage is wrong or sleef scale function not that optimized than that in math.h?