Open aornugent opened 1 year ago
Good find @aornugent @devmitch
I'm not surprised by this for two reasons
The most expensive part of the model is running compute_competiton
, which eventually comes back to calling this function on individuals
https://github.com/traitecoevo/plant/blob/572d2a6783558405dd162e9ab6602a97aa86c54e/src/ff16_strategy.cpp#L401
which itself calls two functions with calls to pow
.
There's other calls on power functions at different points
However, the results above suggest compute competition is 10% of cost. It's unclear whether that's the total of all calls happening in the compute competition stack or not.
It may be that there's a lot of potential speed gain to be had by economising on number of calls to pow.
for example, are there instances where we can use a more efficient call, like x*X instead of pow(x, 2)
As part of investigating #346 - @devmitch demonstrated how to profile an R session using AMDuProf on machines running AMD CPUs.
Notably, 40% of the
run_plant_benchmarks
takes place in thepow
operator of libm.Summaries
Calls
... truncated
A little bit of digging into stuff that I don't totally understand suggests that there's an edge case where
pow
becomes a very expensive operations for certain exponents: http://entropymine.com/imageworsener/slowpow/This appears to be required for high precision usecases: https://stackoverflow.com/questions/9272155/replacing-extrordinarily-slow-pow-function https://stackoverflow.com/questions/14687665/very-slow-stdpow-for-bases-very-close-to-1
Something that takes 40% of the runtime is a tempting optimisation target. It's heartening to see that the plant routines, including spline driven operations (e.g. light competition), are so fast.