msimet / Stile

Stile: the Systematics Tests In Lensing pipeline
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

#8 Code review of master + finishing unit tests #16

Closed msimet closed 10 years ago

msimet commented 10 years ago

Hello all,

Here's a PR for issue #8, which was intended to finish up the unfinished unit test suite, as well as act as a place for people to review the code on master that hasn't already been reviewed. As part of getting the unit test suite working, in addition to some other concerns driven by branches #7 and #14, there's actually a lot of updated code in here, so going through this PR may take some time.

Also, since this is intended to be a code review of master, you might not want to use the "files changed" interface as usual, but instead look at the whole code as of the latest commit--right now that should be this link. The main code is in the stile/ directory, tests in tests/, and example code in examples/. Or just pull branch #8. That'll be a little annoying for line-by-line comments, I'm afraid, though I suspect most of the code is going to show up in the "files changed" pane anyway since there were a lot of small tweaks.

A tour of the code in stile/ as-is, roughly in order of importance/amount of review needed:

In addition to those changes, per @HironaoMiyatake's suggestion on #2, everything in tests/ (including the new stuff) has been ported over to the unittest framework. We still need to use the numpy.testing functions for arrays--numpy.array_equal seems to not work in quite the same way as numpy.testing.assert_array_equal--which means that failing tests will produce a combination of Errors (from numpy.testing) and Failures (from unittest). Still, it seems to have some nice summary functionality that doesn't require external packages. For those who haven't used unittest before, you can run all the tests in the directory at once via python -m unittest discover, or run the individual test files like normal Python scripts.

This also means all the testing files are going to show up in near-entirety in the changed files pane, for those who want line-by-line commenting

Things I'm explicitly still unsure about:

Other than that, happy for any thoughts anyone has as well.

HironaoMiyatake commented 10 years ago

The SingleBin-type objects in binning.py used to return Boolean masks and now just return the already-indexed original array. (That is, you used to have to data = data[bin(data)] and now you just data = bin(data). I can see use cases for both, but I think the second thing is simpler enough (and the gains you might get from and-ing the masks rather than masking sequentially small enough) to leave it this way.

I agree that the second thing is simpler enough. I found that SingleFunctionBin keeps both features. If we decide to use only the second one, do we want to make it the same as SingleBin?

msimet commented 10 years ago

I found that SingleFunctionBin keeps both features. If we decide to use only the second one, do we want to make it the same as SingleBin?

SingleFunctionBin doesn't keep both features: it's always binned_data = SingleFunctionBin(data). The difference is the function that's passed as an argument to SingleFunctionBin: it can return either the mask of bools or an array of bin numbers, but the SingleFunctionBin itself always just returns the binned data.

HironaoMiyatake commented 10 years ago

I see. I misunderstood!

msimet commented 10 years ago

And a conversation @HironaoMiyatake had earlier today: at the moment, Stile uses formatted arrays that are not formally numpy.recarrays. I don't have a particular principled reason for this--it's just that I can make formatted numpy arrays, but always run into trouble trying to make recarrays. If anyone proposes a switch, I'm happy to either let you do it or figure out why myself.

For those who haven't used one or both of those things before, as far as I can tell, the main difference is that numpy.recarrays let you have multiple aliases for the same column, while formatted numpy arrays only allow one. There are some pieces of the code right now that assume there's a 1:1 correspondence between field name and field order, so we'd have to make sure that didn't break if we allow recarrays in.

msimet commented 10 years ago

That should read, "figure out how"

HironaoMiyatake commented 10 years ago

I do not have a strong opinion here. I thought it might confuse those who are familiar with numpy.recarray, but FormatArray is close enough, so there would be no problem.

rmandelb commented 10 years ago

Hi Melanie - All tests pass for me on coma (no surprise; I would guess you've checked there). I am having some issues on my Mac:

ERROR: test_MakeCorr2FileKwargs (__main__.TestCorr2Utils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_corr2_utils.py", line 266, in test_MakeCorr2FileKwargs
    result = stile.MakeCorr2FileKwargs(data)
  File "../stile/corr2_utils.py", line 813, in MakeCorr2FileKwargs
    new_data_list.append(OSFile(data_list,fields=fields))
  File "../stile/corr2_utils.py", line 586, in __init__
    file_io.WriteTable(self.file_name,self.data,fields=self.fields)
  File "../stile/file_io.py", line 228, in WriteTable
    WriteFITSTable(file_name,data_array,fields)
  File "../stile/file_io.py", line 194, in WriteFITSTable
    table.data = numpy.array(table.data).view(table._data_type)
AttributeError: 'BinTableHDU' object has no attribute '_data_type'

======================================================================
ERROR: test_OSFile (__main__.TestCorr2Utils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_corr2_utils.py", line 227, in test_OSFile
    OSFile2 = stile.corr2_utils.OSFile(arr2)
  File "../stile/corr2_utils.py", line 586, in __init__
    file_io.WriteTable(self.file_name,self.data,fields=self.fields)
  File "../stile/file_io.py", line 228, in WriteTable
    WriteFITSTable(file_name,data_array,fields)
  File "../stile/file_io.py", line 194, in WriteFITSTable
    table.data = numpy.array(table.data).view(table._data_type)
AttributeError: 'BinTableHDU' object has no attribute '_data_type'
test_getCorrelationFunctionSysTest (test_correlation_functions.TestCorrelationFunctions) ... Error: Required parameter ra_units not found

But then the test passes, so I'm not sure what gives.

msimet commented 10 years ago

The second one's not an error--or rather it's an expected error: we send some wrong stuff to corr2 and make sure it fails (to check it's not caching earlier results basically); the error message you see is what corr2 prints before it exits, and the exit is caught by the testing script which expects it. I can try to redirect the output so you don't see that. (There's a note if you run test_corr2_utils.py directly, but not if you do the unittest discover thing.)

As for the _table_view error, that's related to pyfits. I'll have to think about that a little more--that line of code is a workaround required for table writing to work on the version of pyfits I was using on Coma. Can I ask which version of pyfits you've got on your Mac?

rmandelb commented 10 years ago

Pyfits v3.0.3.

If you want to just come over to my office sometime and we can try to sort it out directly instead of you sending suggestions that I have to try out and reply to, that's fine. :)

rmandelb commented 10 years ago

BTW, I noticed that the README is lacking info like required dependencies. I think it would be good to include those as part of this PR if you don't mind.

msimet commented 10 years ago

No, that sounds good--the list is pretty short, after all. :) I can make that edit too in a bit--it's just numpy, right, with the recommended dependencies of corr2 greater than whatever, matplotlib, pyfits/astropy?

rmandelb commented 10 years ago

Re: the dependency list - we should be clear just how much of the functionality depends on those optional dependencies. Actually is pyfits/astropy recommended or required? I don’t see how we can do very much without it.

msimet commented 10 years ago

Okay, I think I've addressed everything we talked about so far except the README, which I'm still putting together...

msimet commented 10 years ago

And I just pushed a quick first draft of a README too.

rmandelb commented 10 years ago

Looks good!

rmandelb commented 10 years ago

Hi Melanie - I just put some comments on bits of the code that you had mentioned would be good to check out. Do we need another person to go over the whole thing, or do you think we're okay with Hironao's and my comments?

msimet commented 10 years ago

Hmm. I guess I'd feel more comfortable if someone else went over it, but I'm not sure it's strictly necessary at this point.

I'll leave this open for now and plan to merge Thursday evening, unless somebody lets me know they're in the middle of a code review.

msimet commented 10 years ago

Oh--and I addressed all of your comments, I think.

rmandelb commented 10 years ago

The latest set of commits look good to me.

Re: another person to do code review, I think that if you would like one then your best bet is to choose a victim and e-mail them directly to ask if they are able to do so on a reasonable timescale.