icecube / flarestack

Unbinned likelihood analysis code for astroparticle physics datasets
https://flarestack.readthedocs.io/en/latest/?badge=latest
MIT License
8 stars 7 forks source link

Determination of `scale_range` in minimisation.py fails for certain inputs #286

Open mlincett opened 1 year ago

mlincett commented 1 year ago

Under certain circumstances, flarestack is unable to build the array with scale ranges. For example with n_steps=2 this happens:

File /afs/ifh.de/group/amanda/scratch/mlincett/software/flarestack/flarestack/core/minimisation.py:268, in MinimisationHandler.trial_params(mh_dict)
    266     steps = int(mh_dict["n_steps"])
    267     background_ntrials_factor = mh_dict.get("background_ntrials_factor", 10)
--> 268     scale_range = np.array(
    269         [0.0 for _ in range(background_ntrials_factor)]
    270         + list(np.linspace(0.0, scale, steps)[1:])
    271     )
    273 n_trials = int(mh_dict["n_trials"])
    275 return scale_range, n_trials

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (11,) + inhomogeneous part.

I suspect that when linspace returns a single element then the thing breaks. It may be argued that n_steps=2 is not a very useful setting, but the UX here is far from being optimal. I will try to add some sanity checks here.

mlincett commented 1 year ago

OK, the actual problem arises when something else than a scalar (for example a numpy array) is passed as scale value.

I guess nobody would do that on purpose, but somehow this behaviour is triggered in our "LLH workshop" notebook, that requires very much a revision.

I guess this is now less urgent as a "bug", but a adding sanity check on the input will be beneficial.

P.S.: also, the construction of scale_range should rather use np.zeros and np.hstack rather than converting array to lists back and forth.

robertdstein commented 1 year ago

I'm not sure how useful this is as a comment, but if I was starting flarestack from scratch, I would definitely replace the mh_dict/rh_dicts with pydantic models (https://docs.pydantic.dev/latest/). That would be a great way to enforce typing, validate the fields carefully to avoid cases like this, and generally ensure that the user is warned before executing code if the config is bad.

I also agree that the converting of lists <-> arrays is not good.

mlincett commented 1 year ago

I'm not sure how useful this is as a comment, but if I was starting flarestack from scratch, I would definitely replace the mh_dict/rh_dicts with pydantic models (https://docs.pydantic.dev/latest/). That would be a great way to enforce typing, validate the fields carefully to avoid cases like this, and generally ensure that the user is warned before executing code if the config is bad.

I also agree that the converting of lists <-> arrays is not good.

I had the idea of converting the mh_dict to something like a YAML configuration and handling transparently the generation of the analysis name/path.

I had heard (but also forgot, in the meantime) about pydantic, it seems really cool so thanks for the heads up. Not sure if/when it will come to flarestack but definitely something to have in the toolbox.