#14 Config interface - Githubissues

Hi folks, I'm sure what you wanted from a Monday night was a several-thousand-line pull request, and so I'm here to provide one! :)

Before I get into this PR, I wanted to mention that I updated master with 1) some whitespace/line-length fixes and 2) a large ASCII text file used in the tests for this PR. That way none of that shows up in the already-very-long diff here. :)

The best place to start is probably to take a look at the file doc/Config.md, which describes how you input files and sys_tests. (The file with formatting is here: https://github.com/msimet/Stile/blob/%2314/doc/Config.md -- the diff doesn't look as nice.) In brief, you can do lists of dicts that describe what all the files are, or you can do nested dicts (possibly to mimic a file structure tree); same thing for the sys_tests. The files have to have a full description of what data they contain (spatial extent, epoch, data format--which I've been calling a "format", and if you have a better term please suggest it--and object type). The sys_tests don't--they'll match with whichever files match the format keys specified.

The input processing happens in stile/data_handler.py. Basically, the input list-of-dicts/nested-dict is parsed into a list of dicts, with every dict retaining knowledge of its parent dict levels if it was part of a nested dict. These lists of dicts are reformatted back into a single dict, not quite as nested as before: the keys are now strings of the form epoch-extent-data_format. Then they're grouped if requested (for stuff like star-galaxy cross-correlations, which need to know which files go together). The systematics tests are processed in the same way--into a list of dicts--and then we copy the structure of the files dict and populate it by matching with the sys_tests dicts. Then those dict descriptions are turned into SysTest objects (plus some other information used in calls to the object).

The driver (in stile/drivers.py) then queries the data handler for what kinds of data it contains and builds up a list of all the files and a dict of which tests should be run on them. It then loops through the files and runs all the tests. It has two modes: save_memory=True, in which case it will just do all the files linearly and not save data that's read in for one cross-correlation function if it's not needed for another; and save_memory=False, in which case it will start doing the files linearly, but then take side jaunts to run tests on other files it's had to read in, saving all the other files it's read in until it can do those too.

Finally, there's a new executable bin/StileConfig.py which you just call with a config file or files on the command line to get Stile to run the tests.

Other things added or changed in this PR:

There's now a file examples/example_run.yaml that does the same tests as example_run.py, albeit with longer and more annoying output filenames.
Some of the objects in binning.py now have __eq__ defined to make unit testing easier. Once this PR is merged I'll go rewrite the binning tests to use this functionality too.
the getOutputPath() method of the base DataHandler, which already existed, has been significantly updated to be actually useful.
I discovered that the ancient version of numpy on my personal laptop doesn't like printing headers with savetxt so I've updated some things in file_io.py with a workaround, plus a quick ReadImage dummy function to parallel ReadTable.
Added variables with the default extents, epochs, object_types, data_formats, and fields to stile_utils.py, plus a couple of utilities for parsing inputs.
Added a read of the corr2_aliases dict from the TreeCorr stuff into treecorr_utils.py so we can check those keys too
Some extremely long and obnoxious unit testing for the ConfigDataHandler and ConfigDriver.

Whew!

msimet / Stile

#14 Config interface #58