snoplusuk / echidna

MIT License
4 stars 12 forks source link

Adds chi squared module #14

Closed ashleyrback closed 9 years ago

ashleyrback commented 9 years ago

This is the class described in #4, and provides the functionality required by #2, #5 and #6, although the structure is modified. As such this pull request fixes #2, closes #4, resolves #5 and fixes #6.

The module contains three functions to calculate chi squared for a number given number of observed events and number of expected events. The forms of chi squared calculation included are Pearson's, Neyman's and the Poisson Likelihood chi squared.

The module also adds the ChiSquared class that acts as a calculator for a given form of chi squared calculation. It can also be initialised with penalty term information via the keyword argument penalty_term. Or this can be specified when calling the get_chi_squared method. The method takes two 1D numpy arrays, representing the observed (or "data") spectrum and the expected (or "MC") spectrum, as arguments and iterates over the array, calculating the chi squared for each bin using the appropriate function, and adding a penalty term if required. The method returns the total chi squared comparing the two spectra.

pgjones commented 9 years ago

Also generally did you consider something like:

numpy.sum((x - y)**2/y)

as it might be much quicker?

ashleyrback commented 9 years ago

I ran a little test with two versions of the ChiSquared class, the original version and one that uses the numpy.sum expressions to calculate the chi squared values. I tested the two classes with a short script that is set up in the same way as the unittest for the class, but just calculates Pearson's chi squared followed by the Poisson Likelihood chi squared, for both versions of the class.

I then timed the two versions. For 10 bins with 1000 entries it makes no difference as it takes less than a millisecond in both cases, but for 1000 bins and 1e6 entries the results are:

9083.52727273
9372.02412335

Initial method
==============
0:00:02.330
Elapsed time: 0:00:00.016

9083.52727273
9372.02412335

Numpy method
============
0:00:02.331
Elapsed time: 0:00:00.000

The chi squared values are the same in both cases, which means that the numpy.sum method is working as expected. Also the original method takes 16 ms to perform the two calculations, where as the numpy.sum method still takes less than a millisecond, so it is indeed faster.

jwaterfield commented 9 years ago

Following output from pep8 checker:

chi_squared.py:5:1: E302 expected 2 blank lines, found 1 chi_squared.py:7:1: W293 blank line contains whitespace chi_squared.py:9:74: W291 trailing whitespace chi_squared.py:38:80: E501 line too long (80 characters) chi_squared.py:85:26: E261 at least two spaces before inline comment chi_squared.py:102:79: E225 missing whitespace around operator

jwaterfield commented 9 years ago

pep8 output for test_chi_squared

test_chi_squared.py:7:1: E302 expected 2 blank lines, found 1 test_chi_squared.py:14:80: E501 line too long (86 characters) test_chi_squared.py:62:27: E225 missing whitespace around operator test_chi_squared.py:69:1: W293 blank line contains whitespace test_chi_squared.py:75:77: W291 trailing whitespace test_chi_squared.py:75:68: E201 whitespace after '{'

jwaterfield commented 9 years ago

Few lines of trailing whitespace. Happy to merge once changed chi_squared.py:138:59: W291 trailing whitespace chi_squared.py:167:59: W291 trailing whitespace chi_squared.py:200:59: W291 trailing whitespace