inline many core functions

perrydv commented 1 year ago

This PR changes the following C++ functions in Utils.h to be inlined. That means they now use the C++ keyword inline (previously they called a library function that on-the-fly compilation would link to) and thus the function contents get placed directly where called and are compiled there. This reduces function call overhead and presumably allows compiler optimizations. A user shared that the new pow_int in version 1.0.0 introduced a surprisingly large slowdown in performance compared to previous versions, and this was traced to pow becoming pow_int in user-defined code.

This PR inlines the following (with notes on performance [before --> after] on a mac for 10 million calls, timed in seconds):

pow_int (0.201 --> 2e-6)
ilogit. (0.073 --> 2e-6)
logit (ditto)
cloglog. (0.143 --> 2e-6)
icloglog (0.146 --> 2e-6)
iprobit (little difference because the cost is in pnorm)
probit. (little difference because the cost is in qnorm)
nimEquals
nimbleIfElse
lfactorial (little difference because the cost is in lgamma)
factorial. (little difference because the cost is in gamma)
nimRound
pairmax in overloaded (double, double) case
pairmin (ditto)
nimStep
cube
inprod in overloaded (double, double) case (should be rare)

Cases without notes went from about 0.03-->2e-6, and that 0.03 might be just the baseline cost of packing up function calls.

All of these seem fast before or after the changes (these are cumulative times for 10 million calls). Yet, the pow_int was the source of a serious slowdown, so perhaps the overhead and costs compound in more complex compiled situations using lots of memory etc. The benchmarks I ran were isolated and had nothing else going on.

danielturek commented 1 year ago

@perrydv The performance results are remarkable.

Who ever knew the power of the cpp keywork inline ?

perrydv commented 1 year ago

Thanks @danielturek . Well Eigen for example makes heavy use of inlining. It will be interesting to see what kind of net performance gains these give, whether they are swamped by other costs.

nimble-dev / nimble

inline many core functions #1349