icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 47 forks source link

Implement KDE bootstrapped error estimates and more #689

Closed atrettin closed 2 years ago

atrettin commented 2 years ago

This PR adds the option to estimate errors on KDE'd histograms using the bootstrap_kde class from the KDE module that we already used. It's of course very slow and not recommended to be used in a fit, but it is very useful for fitting hyper surfaces with statistically sound errors. Other changes are:

philippeller commented 2 years ago

I think it looks mostly good! The only thing I am a little hesitant with is the stashing mechanism....just wondering if this will bite us at some point.... Do the parameters in stages before actually not change, or do they still change and the changes are being ignored?

Back in the day we made for example Param sets hasable....so one can quite easily detect if any changes were made. So maybe there we could think of a more global mechanism of stashing....maybe?

atrettin commented 2 years ago

@philippeller Yes, the stashing is a bit of a use-at-your-own-risk thing. All changes in previous stages after the first call to apply are ignored. I'm not aware of a way that a stage could know that the weights have changed and require re-running of the KDE... It's tricky because it would need to know just if any parameter in a stage before has changed. If there is a way to do it I would gladly implement it, otherwise we can give out a warning when the setting is enabled about the behavior.

atrettin commented 2 years ago

Or, better, we run a check on the pipeline after instantiation to look for configuration errors and produce an error if the stage before the KDE stage has a parameter that is free while stashing is enabled. Errors are better than warnings, we still have tons of warnings in PISA that just get ignored... (actually an issue on its own, too many warnings)

philippeller commented 2 years ago

I think cleanest would be to implement stashing in general at the pipeline level.

So you would stay: stash at stage X, and then the pipeline will not run anything up to stage X, except when a parameter has changed. What do you think?

It probably should then keep an internal copy of the weights and apply those, just to avoid side effects, but that is low cost

atrettin commented 2 years ago

That sounds like a good idea! Though, I'm wondering how exactly we'd go about it to make it safe in general. The weights are often manipulated in place in a pipeline, so as you said it would have to store an internal copy of the weights and re-apply them. But what if there is more than just "weights" that need to be applied? I guess we can say that we store everything that's in apply_mode... I need to think about this.

philippeller commented 2 years ago

we could make the user specify what keys to be cached! So this way would be explicit.

E.g. in the cfg specify: At stage X cache keys x, and y

philippeller commented 2 years ago

We could do that in the [pipeline] section of the cfg most naturally. Add a optional: stash = {5 : ['weights', 'blah', ...]} or so

atrettin commented 2 years ago

Hey, @philippeller!

Sorry I was too busy with other stuff, now coming back to this. Could we please find a compromise here? I understand that it would be nice to implement caching globally, but right now it's only used for the KDE stage and for a very specific reason. I see the point that it could lead to unexpected behavior if used inadvertently, and I'm prepared to make changes that make it harder to mis-use such as calling the caching flag ignore_free_parameters_in_earlier_stages and also issuing a warning. But I just don't have time to implement caching consistently throughout PISA right now, in a way that is generally applicable, just to end up using it in this one stage.

Could we please just pass the salt this one time?

philippeller commented 2 years ago

sure, i do not want to be an unreasonable pain. Can you create an issue about our discussion, and that we should fix it soon? Then we will merge the PR as is for now

philippeller commented 2 years ago

Ok, just did that. Will merge the PR now

atrettin commented 2 years ago

Thank you!