msimet / Stile

Stile: the Systematics Tests In Lensing pipeline
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

#14 Config interface #58

Closed msimet closed 6 years ago

msimet commented 9 years ago

Hi folks, I'm sure what you wanted from a Monday night was a several-thousand-line pull request, and so I'm here to provide one! :)

Before I get into this PR, I wanted to mention that I updated master with 1) some whitespace/line-length fixes and 2) a large ASCII text file used in the tests for this PR. That way none of that shows up in the already-very-long diff here. :)

The best place to start is probably to take a look at the file doc/Config.md, which describes how you input files and sys_tests. (The file with formatting is here: https://github.com/msimet/Stile/blob/%2314/doc/Config.md -- the diff doesn't look as nice.) In brief, you can do lists of dicts that describe what all the files are, or you can do nested dicts (possibly to mimic a file structure tree); same thing for the sys_tests. The files have to have a full description of what data they contain (spatial extent, epoch, data format--which I've been calling a "format", and if you have a better term please suggest it--and object type). The sys_tests don't--they'll match with whichever files match the format keys specified.

The input processing happens in stile/data_handler.py. Basically, the input list-of-dicts/nested-dict is parsed into a list of dicts, with every dict retaining knowledge of its parent dict levels if it was part of a nested dict. These lists of dicts are reformatted back into a single dict, not quite as nested as before: the keys are now strings of the form epoch-extent-data_format. Then they're grouped if requested (for stuff like star-galaxy cross-correlations, which need to know which files go together). The systematics tests are processed in the same way--into a list of dicts--and then we copy the structure of the files dict and populate it by matching with the sys_tests dicts. Then those dict descriptions are turned into SysTest objects (plus some other information used in calls to the object).

The driver (in stile/drivers.py) then queries the data handler for what kinds of data it contains and builds up a list of all the files and a dict of which tests should be run on them. It then loops through the files and runs all the tests. It has two modes: save_memory=True, in which case it will just do all the files linearly and not save data that's read in for one cross-correlation function if it's not needed for another; and save_memory=False, in which case it will start doing the files linearly, but then take side jaunts to run tests on other files it's had to read in, saving all the other files it's read in until it can do those too.

Finally, there's a new executable bin/StileConfig.py which you just call with a config file or files on the command line to get Stile to run the tests.

Other things added or changed in this PR:

Whew!

msimet commented 6 years ago

This is really not the right way to do this task, I think. I'm going to close this PR and try a simpler version later.