Closed perrydv closed 11 months ago
@perrydv The performance results are remarkable.
Who ever knew the power of the cpp keywork inline
?
Thanks @danielturek . Well Eigen for example makes heavy use of inlining. It will be interesting to see what kind of net performance gains these give, whether they are swamped by other costs.
This PR changes the following C++ functions in Utils.h to be inlined. That means they now use the C++ keyword
inline
(previously they called a library function that on-the-fly compilation would link to) and thus the function contents get placed directly where called and are compiled there. This reduces function call overhead and presumably allows compiler optimizations. A user shared that the newpow_int
in version 1.0.0 introduced a surprisingly large slowdown in performance compared to previous versions, and this was traced topow
becomingpow_int
in user-defined code.This PR inlines the following (with notes on performance [before --> after] on a mac for 10 million calls, timed in seconds):
pow_int
(0.201 --> 2e-6)ilogit
. (0.073 --> 2e-6)logit
(ditto)cloglog
. (0.143 --> 2e-6)icloglog
(0.146 --> 2e-6)iprobit
(little difference because the cost is in pnorm)probit
. (little difference because the cost is in qnorm)nimEquals
nimbleIfElse
lfactorial
(little difference because the cost is in lgamma)factorial
. (little difference because the cost is in gamma)nimRound
pairmax
in overloaded (double, double) casepairmin
(ditto)nimStep
cube
inprod
in overloaded (double, double) case (should be rare)Cases without notes went from about 0.03-->2e-6, and that 0.03 might be just the baseline cost of packing up function calls.
All of these seem fast before or after the changes (these are cumulative times for 10 million calls). Yet, the
pow_int
was the source of a serious slowdown, so perhaps the overhead and costs compound in more complex compiled situations using lots of memory etc. The benchmarks I ran were isolated and had nothing else going on.