Allow passing in list of files to coadd and add generic coaddition method

jotaylor commented 11 months ago

See https://jira.stsci.edu/browse/ULLYSES-1644 Right now, to coadd/abut data, you must pass in an input directory and all files of a given instrument/grating in that dir will be coadded/abutted. Being able to pass in a specific list of files to coadd/abut will vastly improve the HLSP creation scripts, since we sometimes only coadd certain files of a given target, and will also work much better with the new data storage format in ALL_DATA.

Additionally, a generic coaddition method was created.

Some significant changes were made:

all __init__'s in coadd & wrapper were updated to include an optional list of infiles. If infiles is not specified, files are determined in the old method
when infiles is specified at runtime for wrapper.py, the same instrument+grating sorting is performed, i.e. only identical modes will be coadded even if multiple modes are present in infiles
in wrapper.py, split the main function into 2 separate functions, one to determine a list of files (find_files) when passing in a directory, the other to do routine coadd and abut for that list of files (coadd_and_abut_files). This allows for creating HLSPs via the directory or infiles method.
A generic coadd wrapper was created in generic_coadd_wrapper.py. This takes as input a list of files and an output directory and will create a coadd for the specified files. This works with non-ULLYSES data and I have tested that it will coadd random COS data.
To differentiate the generic wrapper from the ULLYSES wrapper, wrapper.py was renamed to ullyses_coadd_abut_wrapper.py.
a few bookkeeping things were updated in coadd.py: the path var was changed to inpath to not conflict with the os.path. Switched to using super in the subclasses to SegmentList.

jotaylor commented 11 months ago

I have tested this version of the code against main and found 0 differences in resulting HLSPs. I also tested the generic wrapper on non-ULLYSES data and it worked perfectly.

stscirij commented 11 months ago

Hi @jotaylor I had the same issue for HASP, I'd put the hook in the coadd code to allow for just instantiating a SegmentList without trying to read the data by setting the instrument and grating parameters to None. I then wrote a separate member function for the hasp base class to import the data files. It was probably a mistake to originally combine the class instantiation with inputting the data, but it was very convenient in the early stages of ullyses when you just wanted to create and populate a segmentlist with a minimum of code

stscirij commented 11 months ago

Also will need to ensure that the changes to the coadd function doesn't impact hasp - I don't think it will but the fingers will be crossed until the test is done...

stscirij commented 11 months ago

Will also need to update the README.md to reflect the different modules for creating regular ullyses products and for creating products from non-ullyses specific data

stscirij commented 11 months ago

Lol we are playing tennis with the write function: was originally in coadd, then I took it out to put it into the wrapper, now you've put it back into coadd... Personally, I think it belongs in the wrapper (since it is writing ullyses-specific products) but as long as the hasp classes correctly override this it's probably not worth fighting over. Although it might be good coding practice to have the ullyses write function in the ullyses wrapper, and then you have the opportunity to write a generic simple write function in the generic wrapper...

jotaylor commented 11 months ago

Hi @jotaylor I had the same issue for HASP, I'd put the hook in the coadd code to allow for just instantiating a SegmentList without trying to read the data by setting the instrument and grating parameters to None. I then wrote a separate member function for the hasp base class to import the data files. It was probably a mistake to originally combine the class instantiation with inputting the data, but it was very convenient in the early stages of ullyses when you just wanted to create and populate a segmentlist with a minimum of code

The HASP implementation sounds interesting. Can you point me to the code for it? I wonder if it might make more sense to just put it directly in the SegmentList here? And yes, completely agree with the initial implementation- we didn't know back then all the wacky data we'd get.

jotaylor commented 11 months ago

Also will need to ensure that the changes to the coadd function doesn't impact hasp - I don't think it will but the fingers will be crossed until the test is done...

The API to coadd hasn't changed at all, there's just an additional optional argument to __init__ that allows directly passing in a list of files rather than a directory string. So I think it should be good? The more substantial updates were to wrapper.

jotaylor commented 11 months ago

Lol we are playing tennis with the write function: was originally in coadd, then I took it out to put it into the wrapper, now you've put it back into coadd... Personally, I think it belongs in the wrapper (since it is writing ullyses-specific products) but as long as the hasp classes correctly override this it's probably not worth fighting over. Although it might be good coding practice to have the ullyses write function in the ullyses wrapper, and then you have the opportunity to write a generic simple write function in the generic wrapper...

Indeed 🙈 ... It made sense to move the write method into wrapper because we had so much ULLYSES-specific code in it. The write method I've put in the coadd code here is supposed to be a "generic" method- there's nothing that depends on using ULLYSES input data. The write method in wrapper has been left untouched, and that still has all the ULLYSES-specific code. I'll admit that the reason I added this was somewhat personal- I am trying to use the ULLYSES coadd method for my own personal project and found that while I could coadd my data, I couldn't write the products out without using the full ULLYSES wrapper, and then things became tedious. My thinking was that having the generic write in the coadd script would allow users to just import coadd and go from there (and specific projects can make new write methods in their own subclasses, like ULLYSES does). But if that causes trouble and/or isn't best coding practice, happy to move the generic write to the generic wrapper. If I did that, would I have to make a new subclass (e.g. Generic_SegmentList) and put the write method there?

stscirij commented 11 months ago

Lol we are playing tennis with the write function: was originally in coadd, then I took it out to put it into the wrapper, now you've put it back into coadd... Personally, I think it belongs in the wrapper (since it is writing ullyses-specific products) but as long as the hasp classes correctly override this it's probably not worth fighting over. Although it might be good coding practice to have the ullyses write function in the ullyses wrapper, and then you have the opportunity to write a generic simple write function in the generic wrapper...

Indeed 🙈 ... It made sense to move the write method into wrapper because we had so much ULLYSES-specific code in it. The write method I've put in the coadd code here is supposed to be a "generic" method- there's nothing that depends on using ULLYSES input data. The write method in wrapper has been left untouched, and that still has all the ULLYSES-specific code. I'll admit that the reason I added this was somewhat personal- I am trying to use the ULLYSES coadd method for my own personal project and found that while I could coadd my data, I couldn't write the products out without using the full ULLYSES wrapper, and then things became tedious. My thinking was that having the generic write in the coadd script would allow users to just import coadd and go from there (and specific projects can make new write methods in their own subclasses, like ULLYSES does). But if that causes trouble and/or isn't best coding practice, happy to move the generic write to the generic wrapper. If I did that, would I have to make a new subclass (e.g. Generic_SegmentList) and put the write method there?

Ahh - I didn't notice that the write method you'd put in to coadd was generic, in that case, I think that's perfectly appropriate!

stscirij commented 11 months ago

Hi @jotaylor I had the same issue for HASP, I'd put the hook in the coadd code to allow for just instantiating a SegmentList without trying to read the data by setting the instrument and grating parameters to None. I then wrote a separate member function for the hasp base class to import the data files. It was probably a mistake to originally combine the class instantiation with inputting the data, but it was very convenient in the early stages of ullyses when you just wanted to create and populate a segmentlist with a minimum of code

The HASP implementation sounds interesting. Can you point me to the code for it? I wonder if it might make more sense to just put it directly in the SegmentList here? And yes, completely agree with the initial implementation- we didn't know back then all the wacky data we'd get.

The hasp repo should be public now: https://github.com/spacetelescope/hasp - let me know if you can't access it

jotaylor commented 11 months ago

Okay, so it sounds like the question left now is whether or not the supplying of a list of files should be done in SegmentList.__init__ like I've implemented here, or with some kind of standalone method as in HASP. To reiterate, the way I did it here is backwards-compatible and will not affect any subclasses of the original SegmentList. Kind of up to preference here, I think.

stscirij commented 11 months ago

Okay, so it sounds like the question left now is whether or not the supplying of a list of files should be done in SegmentList.__init__ like I've implemented here, or with some kind of standalone method as in HASP. To reiterate, the way I did it here is backwards-compatible and will not affect any subclasses of the original SegmentList. Kind of up to preference here, I think.

Yes, I think they have the same end result so it seems reasonable. I think the HASP and ullyses implementations are different too - the hasp code makes the product without checking whether the instruments and gratings match - it assumes that checking has been done in the creation of the list of filenames, so perhaps it's good to have 2 separate implementations.

jotaylor commented 10 months ago

sorry @stscirij can you re-approve? I had to resolve some merge conflicts from main. Also, tests are failing because it's looking for the new _aspec suffix for level3 files. I think this can be fixed if you upload new versions of the "truth" files on Box?

stscirij commented 10 months ago

@jotaylor before you merge, can you put back the call to main in the run_wrapper function in tests/test_wrapper.py? Thanks!

jotaylor commented 10 months ago

@jotaylor before you merge, can you put back the call to main in the run_wrapper function in tests/test_wrapper.py? Thanks!

good catch, done!

stscirij commented 10 months ago

@jotaylor before you merge, can you put back the call to main in the run_wrapper function in tests/test_wrapper.py? Thanks!

good catch, done!

Yay tests!

spacetelescope / ullyses

Allow passing in list of files to coadd and add generic coaddition method #65