kthohr / gcem

A C++ compile-time math library using generalized constant expressions
https://gcem.readthedocs.io/en/latest/
Apache License 2.0
734 stars 64 forks source link

Why gcem is much slower than cmath? #45

Open Yikai-Liao opened 4 months ago

Yikai-Liao commented 4 months ago

I have some functions in my library that need to be called at both compile-time and runtime, and cmath has varying degrees of support for constexpr on different platforms, so I chose to use gcem. But in using it, I found that many of gcem's functions are an order of magnitude slower than cmath under O3 optimization. I know that I can write two versions that are called at compile time and at runtime, but I'm wondering why gcem is so much slower at runtime?

1719030925962.png

I've tested this under x86 linux, windows and mac, compiling with g++, msvc and apple clang respectively, and all get roughly the same results.

Yikai-Liao commented 4 months ago

截图_20240622142443 截图_20240622142549

I believe there is a lot of room to optimise the runtime performance of gcem. I was able to reduce the time consumption by about 40% by simply changing the recursion in the tan operation to a loop.


template<int max_depth, typename T>
constexpr
T
tan_cf_loop(const T xx)
noexcept
{
    T ans = T(2*max_depth - 1);
    for(int depth = max_depth - 1; depth > 0; --depth) {
        ans = T(2*depth - 1) - xx / ans;
    }
    return ans;
}

template<typename T>
constexpr
T
tan_cf_main(const T x)
noexcept
{
    return( (x > T(1.55) && x < T(1.60)) ? \
                tan_series_exp(x) : // deals with a singularity at tan(pi/2)
            //
            x > T(1.4) ? \
                x/tan_cf_loop<45>(x*x) :
            x > T(1)   ? \
                x/tan_cf_loop<35>(x*x) :
            // else
                x/tan_cf_loop<25>(x*x) );
}
Yikai-Liao commented 4 months ago

And, I don't really understand why gcem uses tan(x/2) (45 iterations for the worst case) for calculating sine and cosine. Using Chebyshev polynomials to approximate sine and cosine should be a better choice.

See here: https://stackoverflow.com/a/394512/24175656

Yikai-Liao commented 4 months ago

I have created a pull request optimised for trigonometry calculations #46 I'll try to optimize other functions