icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 49 forks source link

Expanding Hyper-Hypersurfaces to N dimensions #663

Closed atrettin closed 3 years ago

atrettin commented 3 years ago

This is a major rework of hypersurfaces to facilitate large-scale interpolation in N dimensions, which became necessary because I found that I need to interpolate at least in 3D for the sterile neutrino analysis. The previous code could only support 2D at most, and the grid points were all loose files being put into a special folder structure which would have become a nasty thing going to higher dimensions. The goal of this rework is also to make the logistics around fitting hypersurfaces on a grid as simple as possible for the user and to store all of the results in one large JSON file that can be read in one go.

There are basically three changes:

The code for standard non-interpolating hypersurfaces remains unchanged.

The new procedure for fitting hypersurfaces on a grid is demonstrated in the fitting script for the sterile analysis. It only requires running three functions:

  1. prepare_interpolated_fit takes in all of the information about nominal and systematic sets and a fit directory. It dumps all of the necessary information to run each fit into JSON files in the given directory.
  2. run_interpolated_fit(fit_dir, job_idx) is run by each job on the cluster. All that is needed is the fit directory and the job index, and thanks to the prepared JSON files all of the other information to run the fit can be loaded automatically.
  3. assemble_interpolated_fits(fit_directory, output_file): When all jobs are complete, this function automatically scans the fit_directory for completed fits and assembles them all into one large JSON named output_file. This is then the file that is passed to the PISA stage as fit_results_file.

The plotting function plot_fits_in_bin of the HypersurfaceInterpolator is also improved and can now show 1D and 2D slices of interpolated fits as long as a value for the remaining dimensions is chosen. The function is demonstrated in a notebook in the fridge.

atrettin commented 3 years ago

Here is an example of a 2D slice taken from a 3D interpolated hypersurface from one bin in the sterile analysis. The interpolation runs over the parameters deltam31, deltam41 and theta24, and deltam31 was set to 2.5e-3 eV^2 for this slice.

hs_slice_deltam31_2 5e-3eV2_bin_8-8-2

philippeller commented 3 years ago

I have a more high-level question. With the changes to the hyper surface stage, are we still able to run this with existing hypersurfaces? I.e. does it preserve backwards compatibility?

atrettin commented 3 years ago

Thank you for the in-depth review, @marialiubarska ! I know it's a lot of stuff... I still didn't have much time to work through it, but there are a few things I'm going to patch.

Concerning the questions:

  1. The problem here is that xi are already floats, so there is no way to tell the order at this point. Maybe I should make the method private since there aren't so many reasons I can think of that one would want to call it from the outside....
  2. Good point, it would be better to make the check more explicit like that!
  3. Yes, that is a valid syntax to make an OrderedDict. It ensures that the keys appear in the order that they are entered, though I think at that point it wouldn't matter.
  4. One problem that I found myself is that, if you fit again with a different grid in the same directory, this function would get confused. It would be better to explicitly calculate the expected number of grid points from the grid specification.
  5. Good point, the assertion should be made as early as possible.
  6. I forgot that one... Good catch! :)

@philippeller For old, non-interpolated hypersurfaces the answer is yes. For interpolated ones, no, because the storage structure has fundamentally changed. Basically, the entire management of how to arrange the files used to be in the fridge and is now in PISA. But it's not a big deal, in my mind, re-fitting is pretty quick and we don't have legacy data to take care of.

atrettin commented 3 years ago

@marialiubarska I made some changes to address the concerns you raised. Thanks again for checking!