snoplusuk / echidna

MIT License
4 stars 12 forks source link

Added poisson energy smearing in smear class #122

Closed EdLeming closed 8 years ago

EdLeming commented 8 years ago

Only added poisson smearing for energy spectra.. Not sure how to do so for radial as poisson is inherently statistical i.e. probability of no of events occurring. In energy this is obvious as events = Nhits, in radius I'm not sure how to make it work....

Also interesting is that using a poisson smear has increased the contribution of tails see attached... One to think about.

smear_results

EdLeming commented 8 years ago

I'm not sure if you two get automatic notifications for pull requests? @ashleyrback @jwaterfield

ashleyrback commented 8 years ago

Yes, I got a notification already. I was just having a look. Looks nice Ed. Thanks for having a look into this.

ashleyrback commented 8 years ago

Two questions straight-off:

  1. How does it compare in terms of execution time to the Gaussian smearing - I'm assuming virtually no difference as you are only changing the one method.
  2. Just out of interest, do you still see the increased contribution of the tails for a finer binning - I think @jwaterfield usually uses 500 eV bins for smearing.
EdLeming commented 8 years ago

Hmmmm, ok. I probably should have checked this earlier....

The attached has 10keV bins. Execution time:

Pois = 520 s Gaus = 13 s

I think this is because of the factorial in the denominator of the poisson function. Once we get up to energies of 2 MeV = 400 photons. It's therefore calculating 400! at each function call. Not only is 400! an outrageously large number, it requires 400 multiplications (unless scipy is doing something clever). smear_results_5sig_0 01mevbin

ashleyrback commented 8 years ago

That's odd, scipy is usually quite good at doing clever stuff to speed things up, but maybe not so much here.

The factorial in the denominator could become a major problem for say B8 where we want to go up like 10 MeV (well we might be able to get away with 5 MeV).

One solution might be to make a new class each for Gaussian and Poisson smearing, that both inherit from a common energy smearing class. You could then recombine them in a sort of combo smear class:

class ComboSmearEnergyLy(GaussianSmearEnergyLy, PoissonSmearEnergyLy):
    # Do stuff

that calls the Poisson method for any bins below say 1 MeV, otherwise uses the Gaussian method. The only thing we will have to be careful of is making sure there are no problems in the transition from one method to the other.

EdLeming commented 8 years ago

That is an option. I would be hesitant to do that before we understand the discrepancy in tails.

Writing the poisson function myself I get identical results / speeds to scipy when I apply the Stirling approximation for the factorial calculation. However, this does require using the decimal module to round numbers to 10 decimal places - if you don't do this numbers become too large and overload the python floating point container. It may be that the tail anomaly is simply a result of the rounding error... Particularly as gaus should be equivalent to poisson at these high statistics levels.

I'll have a fresh look on Monday.

EdLeming commented 8 years ago

So I was having a look at this again today. I wanted to convince myself that, in our case, the gaussian approximation of the poission distribution at high NHit is a reasonable one.

I tested the gaussian-ness of smeared distributions by smearing delta functions (i.e. a load of events all of one energy) with both the methods. I then fitted gaussians to the resulting dists and took a chi2. The results are in the plot below. Interesting to note, the chi2 of the gaussian smeared distribution is ~3 orders of magnetude better than the poisson one at 5MeV (I can send on the root file with all the histos/fits etc if anyone's interested). I think this is strong motivation to use a poisson smear across the full energy range. After all, it's tails smearing to high energy that is our biggest worry for 2nuBB limit setting and it appears the gauss approx is underestimating this.

Considering this our best option is to find a way to speed up the computation time of the calc_poisson_energy method. After an afternoon of testing it appears to be limited by the multiplication of extemely large numbers - lambda**Nhits can be as large as 1e6500 for the largest energy bins. Does anyone have any tricks for speeds up that they can recommend @mjmottram? The gmpy2 module looks promising, but I'm having issues installing it.

chi2_plot

mjmottram commented 8 years ago

I don't have anything to input on this I'm afraid, other than to mention that I can install gmpy2 (with some warnings) via pip.

EdLeming commented 8 years ago

Ok, I got it up to speed. It's now equivalent (even slightly faster... ) than the gaussian smear. @ashleyrback @jwaterfield

ashleyrback commented 8 years ago

@EdLeming can you merge snoplusuk/master to resolve any conflicts please, and then I'll get this PR merged as well. I think @mjmottram is quite keen to start using it.

EdLeming commented 8 years ago

Should be good to go!

ashleyrback commented 8 years ago

A few unittest problems:

======================================================================
ERROR: test_weight_energy_ly (test_smear.TestSmear)
Tests the weighted gaussian smearing method for energy
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna-pr/echidna/test/test_smear.py", line 102, in test_weight_energy_ly
    par="energy_mc")
  File "echidna/core/smear.py", line 269, in weighted_smear
    mean))
  File "echidna/core/smear.py", line 144, in calc_poisson_energy
    self._log_factorial[photons] = numpy.sum(numpy.log(np.arange(1,(photons+1))))
NameError: global name 'np' is not defined

======================================================================
ERROR: test_weight_energy_res (test_smear.TestSmear)
Tests the weighted gaussian smearing method for energy resolution.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna-pr/echidna/test/test_smear.py", line 188, in test_weight_energy_res
    par="energy_mc")
  File "echidna/core/smear.py", line 508, in weighted_smear
    for energy in np.arange(low, high, widths[i]):
NameError: global name 'np' is not defined

======================================================================
FAIL: test_random_energy_ly (test_smear.TestSmear)
Tests the random gaussian smearing method for energy light yield.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna-pr/echidna/test/test_smear.py", line 154, in test_random_energy_ly
    % (expected_sigma, sigma))
AssertionError: Expected sigma 0.111803398875, fitted sigma 0.10008393979

======================================================================
FAIL: test_random_energy_res (test_smear.TestSmear)
Tests the random gaussian smearing method for energy resolution.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ashley/snoplus/software/echidna/echidna-pr/echidna/test/test_smear.py", line 240, in test_random_energy_res
    % (expected_sigma, sigma))
AssertionError: Expected sigma 0.0790569415042, fitted sigma 1.00681369374

----------------------------------------------------------------------

I think the np Exceptions are my fault, these just need to be changed to numpy - I thought I'd changed all of them.

The other failures are probably because poisson is True by default, so you'll need to set poisson as False for the expected numbers.

Should there be some additional tests to test the Poisson case explicitly?

ashleyrback commented 8 years ago

Sorry to nag but there are also some pep8 errors, see below, it is important that we try and maintain the pep8 standard throughout echidna.

echidna/core/smear.py:113:53: W291 trailing whitespace
echidna/core/smear.py:118:80: E501 line too long (80 > 79 characters)
echidna/core/smear.py:119:44: W291 trailing whitespace
echidna/core/smear.py:138:17: W291 trailing whitespace
echidna/core/smear.py:143:31: W601 .has_key() is deprecated, use 'in'
echidna/core/smear.py:143:49: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
echidna/core/smear.py:144:75: E231 missing whitespace after ','
echidna/core/smear.py:144:80: E501 line too long (89 > 79 characters)
echidna/core/smear.py:145:80: E501 line too long (101 > 79 characters)
echidna/core/smear.py:145:88: E261 at least two spaces before inline comment
echidna/core/smear.py:145:88: E262 inline comment should start with '# '
echidna/core/smear.py:147:1: W293 blank line contains whitespace
echidna/core/smear.py:267:52: E712 comparison to True should be 'if cond is True:' or 'if cond:'
echidna/core/smear.py:333:44: E712 comparison to True should be 'if cond is True:' or 'if cond:'
echidna/core/smear.py:336:80: E501 line too long (82 > 79 characters)
echidna/core/smear.py:352:53: W291 trailing whitespace
echidna/core/smear.py:357:80: E501 line too long (80 > 79 characters)
echidna/core/smear.py:358:44: W291 trailing whitespace
echidna/core/smear.py:377:17: W291 trailing whitespace
echidna/core/smear.py:382:31: W601 .has_key() is deprecated, use 'in'
echidna/core/smear.py:382:49: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
echidna/core/smear.py:383:69: E231 missing whitespace after ','
echidna/core/smear.py:383:80: E501 line too long (83 > 79 characters)
echidna/core/smear.py:384:80: E501 line too long (98 > 79 characters)
echidna/core/smear.py:384:85: E261 at least two spaces before inline comment
echidna/core/smear.py:384:85: E262 inline comment should start with '# '
echidna/core/smear.py:509:52: E712 comparison to True should be 'if cond is True:' or 'if cond:'
echidna/core/smear.py:575:44: E712 comparison to True should be 'if cond is True:' or 'if cond:'
echidna/core/smear.py:578:80: E501 line too long (82 > 79 characters)
echidna/core/smear.py:592:53: W291 trailing whitespace
echidna/core/smear.py:596:80: E501 line too long (80 > 79 characters)
echidna/core/smear.py:597:44: W291 trailing whitespace
ashleyrback commented 8 years ago

Once the unittests pass and pep8 errors are fixed, though it is good to go.

EdLeming commented 8 years ago

OK, should be good to go now. When writing the new unit tests I noticed a couple of issues with the random_smear poisson functionality. All works as expected now.

ashleyrback commented 8 years ago

poisson smearing

mjmottram commented 8 years ago

Is that a link to anywhere in particular @ashleyrback ? looks broken to me. Merge? :+1:

ashleyrback commented 8 years ago

Looks good to me, merging!