scikit-hep / cabinetry

design and steer profile likelihood fits
https://cabinetry.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
26 stars 21 forks source link

perf: drop use of awkward in yield uncertainty calculation #408

Closed ekauffma closed 1 year ago

ekauffma commented 1 year ago

ak.sum is much more expensive than anticipated, so this changes the matrix operations in yield_stdev to use numpy instead, as per suggestion by @alexander-held

partially addresses #409, follows initial improvements done in #315

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (5064e38) 100.00% compared to head (ab4ceb5) 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #408 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 22 22 Lines 2069 2072 +3 Branches 334 337 +3 ========================================= + Hits 2069 2072 +3 ``` | [Impacted Files](https://app.codecov.io/gh/scikit-hep/cabinetry/pull/408?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scikit-hep) | Coverage Δ | | |---|---|---| | [src/cabinetry/model\_utils.py](https://app.codecov.io/gh/scikit-hep/cabinetry/pull/408?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scikit-hep#diff-c3JjL2NhYmluZXRyeS9tb2RlbF91dGlscy5weQ==) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

agoose77 commented 1 year ago

I had some perf suggestions after this popped up in my notifications:

Pre-convert the parameters, and use np.ufunc.at to avoid a temporary

# calculate the model distribution for every parameter varied up and down
# within the respective uncertainties
# ensure float for correct addition
float_parameters = parameters.astype(float)
for i_par in range(model.config.npars):
    # central parameter values, but one parameter varied within uncertainties
    up_pars = float_parameters.copy()
    np.add.at(up_pars, i_par, uncertainty)
    down_pars = float_parameters.copy()
    np.subtract.at(down_pars, i_par, uncertainty)

Pre-allocate stacked arrays and use in-place assignment (unsure of shapes here, so it's pseudo-code)

up_comb_next = np.empty(...)
up_comb_next[...] = up_comb
np.sum(up_comb, axis=0, out=up_comb_next[...])

Pre-allocate the up_variants and down_variations arrays, and assign in-place

up_variations = np.empty(..., dtype=...)

for i_par in range(model.config.npars):
    ...
    up_variations[i] = up_yields

It might also be possible to do the above without the

up_yields = np.concatenate((up_comb, up_yields_channel_sum), axis=1)

step, i.e. directly assign the parts. I'm not sure.

alexander-held commented 1 year ago

Thanks a lot for the additional suggestions @agoose77! I moved them to #415. They look great, I would prefer that we address them in a separate PR to keep the changes a bit more atomic.