spacetelescope / hstcal

Calibration for HST/WFC3, HST/ACS, and HST/STIS
BSD 3-Clause "New" or "Revised" License
11 stars 29 forks source link

Allow batch processing of HSTCAL #179

Open jamienoss opened 7 years ago

jamienoss commented 7 years ago

It would be great if it were possible to process multiple raw files from a single call, for example, to calacs.

Global variables and memory leaks may be a road blocker.

Subtasks:

pllim commented 7 years ago

Technically, you already can but you need to provide the ASN file. And they will be done serially, not parallel.

jamienoss commented 7 years ago

That is not quite what I had in mind. What about the CR rejection, wouldn't the ASN method cause the DQ array to differ from that when ran singularly?

pllim commented 7 years ago

That depends on how the ASN is defined. I don't think it runs ACSREJ if not requested. The ACS Data Handbook (particularly Chapter 3) should tell you for sure.

jamienoss commented 7 years ago

Ok, cool, thanks, I'll take a look. If it's easy to construct an ASN file/table from a list of file names perhaps the only thing to do would be to have ASN files processes the child files in parallel. Though it would be nice to just be able to do something like calacs *raw.fits. Maybe add -b for batch processing to express explicitness and perhaps even -r to recurse - the latter perhaps having hurdles.

pllim commented 7 years ago

It is not that hard to do this in Python using multiprocessing and subprocess modules. I would recommend implementing this as a Python wrapper under ACSTOOLS rather than going pure C. It will be much easier to maintain.

pllim commented 7 years ago

Although... Does multiprocessing play well with the OpenMP stuff in CTE? It seems overly complicated to multiprocess something that already generates multiple threads (albeit for only one of the steps).

jamienoss commented 7 years ago

Yeah, that is an issue, one that could at least be worked around in the same manner that operations do so, by using only a single thread -1. For calacs you can use --nthreads <n> also - I should add this to calwf3 independently from the new implementation in case it is not taken.

There are alternative methods too, so as to be able to specify the number of threads to spawn jobs (instances of calxxx) and then also the number of child (nested) threads that each job can themselves spawn. That maybe something for later.

jamienoss commented 7 years ago

c.f. #181

Whilst it is already possible (assuming this functionality works) to give a comma delimited list of input files to 'batch' process I doubt that this is used, perhaps at all. I think though, that a wildcard ('*') filename expansion would be. My greatest concern with this issue/functionality is the iteration of the code at the root level - is this safe?! I feel that a fair bit of attention would be required to fulfill this feature request, e.g. fix all memory leaks, clean-up/reset all global variables etc.