Additional performance improvements for `model_utils.yield_stdev`

I had some perf suggestions after this popped up in my notifications:

Pre-convert the parameters, and use np.ufunc.at to avoid a temporary
# calculate the model distribution for every parameter varied up and down
# within the respective uncertainties
# ensure float for correct addition
float_parameters = parameters.astype(float)
for i_par in range(model.config.npars):
    # central parameter values, but one parameter varied within uncertainties
    up_pars = float_parameters.copy()
    np.add.at(up_pars, i_par, uncertainty)
    down_pars = float_parameters.copy()
    np.subtract.at(down_pars, i_par, uncertainty)
Pre-allocate stacked arrays and use in-place assignment (unsure of shapes here, so it's pseudo-code)
up_comb_next = np.empty(...)
up_comb_next[...] = up_comb
np.sum(up_comb, axis=0, out=up_comb_next[...])
Pre-allocate the up_variants and down_variations arrays, and assign in-place
up_variations = np.empty(..., dtype=...)

for i_par in range(model.config.npars):
    ...
    up_variations[i] = up_yields
It might also be possible to do the above without the
up_yields = np.concatenate((up_comb, up_yields_channel_sum), axis=1)
step, i.e. directly assign the parts. I'm not sure.

Originally posted by @agoose77 in https://github.com/scikit-hep/cabinetry/issues/408#issuecomment-1614454884

scikit-hep / cabinetry

Additional performance improvements for `model_utils.yield_stdev` #415