Closed msimet closed 10 years ago
The SingleBin-type objects in binning.py used to return Boolean masks and now just return the already-indexed original array. (That is, you used to have to data = data[bin(data)] and now you just data = bin(data). I can see use cases for both, but I think the second thing is simpler enough (and the gains you might get from and-ing the masks rather than masking sequentially small enough) to leave it this way.
I agree that the second thing is simpler enough. I found that SingleFunctionBin
keeps both features. If we decide to use only the second one, do we want to make it the same as SingleBin
?
I found that
SingleFunctionBin
keeps both features. If we decide to use only the second one, do we want to make it the same asSingleBin
?
SingleFunctionBin
doesn't keep both features: it's always binned_data = SingleFunctionBin(data)
. The difference is the function that's passed as an argument to SingleFunctionBin
: it can return either the mask of bools or an array of bin numbers, but the SingleFunctionBin
itself always just returns the binned data.
I see. I misunderstood!
And a conversation @HironaoMiyatake had earlier today: at the moment, Stile uses formatted arrays that are not formally numpy.recarray
s. I don't have a particular principled reason for this--it's just that I can make formatted numpy arrays, but always run into trouble trying to make recarrays. If anyone proposes a switch, I'm happy to either let you do it or figure out why myself.
For those who haven't used one or both of those things before, as far as I can tell, the main difference is that numpy.recarrays let you have multiple aliases for the same column, while formatted numpy arrays only allow one. There are some pieces of the code right now that assume there's a 1:1 correspondence between field name and field order, so we'd have to make sure that didn't break if we allow recarrays in.
That should read, "figure out how"
I do not have a strong opinion here. I thought it might confuse those who are familiar with numpy.recarray
, but FormatArray
is close enough, so there would be no problem.
Hi Melanie - All tests pass for me on coma (no surprise; I would guess you've checked there). I am having some issues on my Mac:
test_corr2_utils.py
, I get these errors (not failures):ERROR: test_MakeCorr2FileKwargs (__main__.TestCorr2Utils)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_corr2_utils.py", line 266, in test_MakeCorr2FileKwargs
result = stile.MakeCorr2FileKwargs(data)
File "../stile/corr2_utils.py", line 813, in MakeCorr2FileKwargs
new_data_list.append(OSFile(data_list,fields=fields))
File "../stile/corr2_utils.py", line 586, in __init__
file_io.WriteTable(self.file_name,self.data,fields=self.fields)
File "../stile/file_io.py", line 228, in WriteTable
WriteFITSTable(file_name,data_array,fields)
File "../stile/file_io.py", line 194, in WriteFITSTable
table.data = numpy.array(table.data).view(table._data_type)
AttributeError: 'BinTableHDU' object has no attribute '_data_type'
======================================================================
ERROR: test_OSFile (__main__.TestCorr2Utils)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_corr2_utils.py", line 227, in test_OSFile
OSFile2 = stile.corr2_utils.OSFile(arr2)
File "../stile/corr2_utils.py", line 586, in __init__
file_io.WriteTable(self.file_name,self.data,fields=self.fields)
File "../stile/file_io.py", line 228, in WriteTable
WriteFITSTable(file_name,data_array,fields)
File "../stile/file_io.py", line 194, in WriteFITSTable
table.data = numpy.array(table.data).view(table._data_type)
AttributeError: 'BinTableHDU' object has no attribute '_data_type'
test_corr2_utils.py
.test_correlation_functions.py
:test_getCorrelationFunctionSysTest (test_correlation_functions.TestCorrelationFunctions) ... Error: Required parameter ra_units not found
But then the test passes, so I'm not sure what gives.
The second one's not an error--or rather it's an expected error: we send some wrong stuff to corr2 and make sure it fails (to check it's not caching earlier results basically); the error message you see is what corr2 prints before it exits, and the exit is caught by the testing script which expects it. I can try to redirect the output so you don't see that. (There's a note if you run test_corr2_utils.py
directly, but not if you do the unittest discover thing.)
As for the _table_view
error, that's related to pyfits. I'll have to think about that a little more--that line of code is a workaround required for table writing to work on the version of pyfits I was using on Coma. Can I ask which version of pyfits you've got on your Mac?
Pyfits v3.0.3.
If you want to just come over to my office sometime and we can try to sort it out directly instead of you sending suggestions that I have to try out and reply to, that's fine. :)
BTW, I noticed that the README is lacking info like required dependencies. I think it would be good to include those as part of this PR if you don't mind.
No, that sounds good--the list is pretty short, after all. :) I can make that edit too in a bit--it's just numpy, right, with the recommended dependencies of corr2 greater than whatever, matplotlib, pyfits/astropy?
Re: the dependency list - we should be clear just how much of the functionality depends on those optional dependencies. Actually is pyfits/astropy recommended or required? I don’t see how we can do very much without it.
Okay, I think I've addressed everything we talked about so far except the README, which I'm still putting together...
And I just pushed a quick first draft of a README too.
Looks good!
Hi Melanie - I just put some comments on bits of the code that you had mentioned would be good to check out. Do we need another person to go over the whole thing, or do you think we're okay with Hironao's and my comments?
Hmm. I guess I'd feel more comfortable if someone else went over it, but I'm not sure it's strictly necessary at this point.
I'll leave this open for now and plan to merge Thursday evening, unless somebody lets me know they're in the middle of a code review.
Oh--and I addressed all of your comments, I think.
The latest set of commits look good to me.
Re: another person to do code review, I think that if you would like one then your best bet is to choose a victim and e-mail them directly to ask if they are able to do so on a reasonable timescale.
Hello all,
Here's a PR for issue #8, which was intended to finish up the unfinished unit test suite, as well as act as a place for people to review the code on master that hasn't already been reviewed. As part of getting the unit test suite working, in addition to some other concerns driven by branches #7 and #14, there's actually a lot of updated code in here, so going through this PR may take some time.
Also, since this is intended to be a code review of master, you might not want to use the "files changed" interface as usual, but instead look at the whole code as of the latest commit--right now that should be this link. The main code is in the
stile/
directory, tests intests/
, and example code inexamples/
. Or just pull branch #8. That'll be a little annoying for line-by-line comments, I'm afraid, though I suspect most of the code is going to show up in the "files changed" pane anyway since there were a lot of small tweaks.A tour of the code in
stile/
as-is, roughly in order of importance/amount of review needed:sys_tests.py
- the main systematics test code file. Right now this has some correlation function code that operates via corr2 (CorrelationFunctionSysTest
and children), as well as a Stats sys test (StatSysTest
) that was part of branch #6/PR #15--so that's already been reviewed.corr2_utils.py
- all the things thatCorrelationFunctionSysTest
needs in order to talk to corr2. This includes some validation of the parameters to write to a config file (CheckArguments
); some functions to write config files for corr2 (WriteCorr2ConfigurationFile
) and read the results of a run of corr2 (ReadCorr2OutputFile
); to harvest the corr2-related arguments from a dict containing other parameters as well (AddCorr2Dict
--see caveats below); and figure out which data sets need to be written to files on disk and make sure they all have the same columns corresponding to ra/dec/g1/g2/etc (MakeCorr2FileKwargs
), plus a bunch of helper functions and variables to make those things easier.file_io.py
- wrapper functions for reading/writing ASCII and FITS files in various ways.stile_utils.py
- other utility functions. TheStats
class was already checked in #6/#15, so the other main block of code to look at is theFormatArray
function, which turns arrays which are not NumPy formatted arrays into formatted arrays (ie things which allow you to ask forarr['ra']
to get all the RA values as well asarr[0]
to get all the data for object 0). These are not formally numpy.recarrays, but they're similar.binning.py
- describes someBin*
objects which set up a binning scheme and, when called, return a list ofSingleBin
objects, which contain the bounds of the bin, each one of which can be called on a data array to return only the data within the bin.data_handler.py
- this is mostly abstract, but defines the kind of interface we'd want for non-specialized (ie non-LSST/HSC) users of Stile to get their data into the program.__init__.py
- loads stuff into thestile
namespace.In addition to those changes, per @HironaoMiyatake's suggestion on #2, everything in
tests/
(including the new stuff) has been ported over to theunittest
framework. We still need to use thenumpy.testing
functions for arrays--numpy.array_equal
seems to not work in quite the same way asnumpy.testing.assert_array_equal
--which means that failing tests will produce a combination of Errors (fromnumpy.testing
) and Failures (fromunittest
). Still, it seems to have some nice summary functionality that doesn't require external packages. For those who haven't usedunittest
before, you can run all the tests in the directory at once viapython -m unittest discover
, or run the individual test files like normal Python scripts.This also means all the testing files are going to show up in near-entirety in the changed files pane, for those who want line-by-line commenting
Things I'm explicitly still unsure about:
corr2_utils.AddCorr2Dict()
takes a dict of parameters, pulls out the ones that can be used by corr2, and creates a new copy of the input dict with an addedcorr2_kwargs
key that contains a dict with only the parameters useful for corr2. Better to have it just return the thing that's innew_dict['corr2_kwargs']
instead? I can't decide.SingleBin
-type objects inbinning.py
used to return Boolean masks and now just return the already-indexed original array. (That is, you used to have todata = data[bin(data)]
and now you justdata = bin(data)
. I can see use cases for both, but I think the second thing is simpler enough (and the gains you might get fromand
-ing the masks rather than masking sequentially small enough) to leave it this way.Other than that, happy for any thoughts anyone has as well.