icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 47 forks source link

hypo_testing stopped working #198

Closed steven-j-wren closed 7 years ago

steven-j-wren commented 7 years ago

I was trying to run hypo_testing.py using the current cake master and I got some errors because of various ways the code has evolved since I first started using it. I tracked down the issues and "fixed" them but I thought they were worth discussing so I don't break anyone else's operations. I uploaded them to this branch:

https://github.com/steven-j-wren/pisa/tree/cake_NMOFix

First issue worth mentioning is that the get_outputs function of distribution_maker now takes an argument to return the "total" outputs; a functionality that was in a separate function get_total_outputs when Philipp's branch was distinct from the master. This was a good change, but the function what exactly the "total" outputs is has now changed. Previously it returned a MapSet that was just the sum of the maps from multiple pipelines and was achieved with:

outputs = reduce(lambda x,y: x + y, outputs)

This made sense when PID was done as two different maps as opposed to a binning dimension. With the definition above one ends up with a MapSet of a single map - which is probably a redundancy. So, it was changed to this:

outputs = reduce(lambda x,y: sum(x) + sum(y), outputs)

And so this returns one Map that is everything. This is how it is in data, so that makes sense. One issue that came about here is that the argument to return this was named sum and so I got an error on my python where this overwrote sum() as it tried to be used in this function above, and so I changed sum=False in the arguments to return_sum=False.

However, this broke some functionality in analysis.py as the object data_dist is expected to be a MapSet and so the function metric_total is used to compare two MapSet objects, one for the data and one for the hypothesis. Since this function is not defined for a Map this breaks and so I removed the sum() around the x and y objects for the get_outputs.

So I guess my question is this, are my changes in my NMO_Fix branch fine or does hypo_testing want to be re-written to work with Maps instead of MapSets now that PID is a binning dimension?

jllanfranchi commented 7 years ago

Here're my thoughts:

  1. your fix for my bug in overloading the name sum seems reasonable
  2. we don't have a generic metric or metric_total (or whatever) method for Map class, but we should. Seems like loads of code would like to not have to call a method by name to get e.g. llh vs. chi2, but instead could call metric( ... , method='llh') similar to how we've implemented in MapSet. If this method has the same name as the method for MapSet, then it's somewhat irrelevant what kind of object (Map or MapSet) is being operated on.
  3. @philippeller convinced me that outputting a single map is the right thing to do (I think you got the logic, that this is what we'll have when we actually get data out of the experiment). I could be convinced otherwise, though. (and am I forgetting something, @philippeller?)
steven-j-wren commented 7 years ago

So I restored it to returning a single Map and I added a metric_total function to the Map. When one requests detailed fit info then you also needed a metric_by_maps function which I guess was intended to have be able to get the portion of the metric from both muons and neutrinos separately. I wasn't quite sure what the logic was. I implemented a metric_by_maps function in the Map class as to get past the error, but it's probably not the best way to do it...

steven-j-wren commented 7 years ago

Did you sort this @jllanfranchi?

jllanfranchi commented 7 years ago

Yep, this appears to be working in the latest cake once again.