scikit-hep / cabinetry

design and steer profile likelihood fits
https://cabinetry.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
26 stars 21 forks source link

Large memory use of `model_utils.yield_stdev` #409

Open alexander-held opened 1 year ago

alexander-held commented 1 year ago

This is a partial follow-up to #315. @ekauffma found that the yield uncertainty calculation can use a large amount of memory. This ultimately is due to using awkward instead of pure numpy: https://github.com/scikit-hep/awkward/discussions/2480. awkward is not strictly needed, but was used for convenience instead. Giving the impact of the effect, it makes sense to switch to pure numpy though. This is done in the following PR:

In addition to this, @ekauffma found that splitting the calculation across channels can significantly improve performance.

Additional performance improvements may be achieved via #415.

ekauffma commented 1 year ago

I did a quick study to understand the differences with using awkward vs numpy in yield_stdev using this workspace here.

First I kept the number of channels constant and changed the number of parameters, measuring the maximum memory usage during the function using memory-profiler and measuring the time and saw the following results: akvnp_para

Then I kept the number of parameters constant by removing the staterror modifiers and varied the number of channels: akvnp_chan

I believe that a per-channel split will also help because it implicitly reduces the number of parameters per matrix computation.