spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Running skymatch with `minimize_memory` opens 2x files for each input association member #8479

Closed stscijgbot-jp closed 1 week ago

stscijgbot-jp commented 4 months ago

Issue JP-3620 was created on JIRA by Brett Graham:

When skymatch is run with minimize_memory it will write out and keep open 2x temporary files for every member of the input association. For large associations (hundreds of members) this can lead to failed runs where the step exceeds the allowed number of open files (using the default value on some systems).

See relevant code here: https://github.com/spacetelescope/jwst/blob/8b254ae2b69fed42c413f4ff7163ff2f983dfa59/jwst/skymatch/skyimage.py#L77

stscijgbot-jp commented 1 month ago

Comment by Ned Molter on JIRA:

This should be fixed as part of implementing ModelLibrary (see JP-3690).  I will post memory profiling for resample here as well as on that ticket as I continue to develop and test the changes.

stscijgbot-jp commented 1 month ago

Comment by Ned Molter on JIRA:

Here is a small summary of results from profiling memory for this step on its own in my PR branch, using as input an association containing a 46-cal-file subset of the data in 

███████████████████████████████████████████ 

The total size of the input _cal files on disk is roughly 5 GiB.

Setting in_memory=True, the peak memory usage is 5.5 GiB, and I'm attaching a graph of usage over time to the ticket (skymatch_in_memory.png).

Setting in=memory=False, the peak memory usage is 1.5 GiB, and a graph is again attached (skymatch_on_disk.png).

So it appears that the switchover to ModelLibrary comes with substantial memory savings in this step, at least for my test dataset.

This memory profiling doesn't directly address the issue in the ticket, but the changes are related:  A new input parameter "in_memory" (so named for consistency with resample and outlier_detection) is added, and the hidden (i.e., not in the spec) keyword argument "minimize_memory" is removed. The reduce_memory_usage parameter to SkyImage is then hardcoded to False when instantiating SkyImage objects within the SkyMatchStep; this is consistent with what was done for Roman.

This change resolves the ticket, since no attempt is made to write these temporary files at all, although it might be worth considering whether the reduce_memory_usage option and associated "if" statements in the SkyImage class should be removed entirely - Brett Graham was this discussed in the context of the Roman changes? 

stscijgbot-jp commented 1 month ago

Comment by Brett Graham on JIRA:

Thanks for the profiling and looking into the "minimize_memory" usage.

As you noted romancal disables this option (and doesn't provide a way to enable it). I think removing the if statements (etc) in SkyImage makes sense and likely would fit well into a PR that moves this class to stcal (so a separate ticket and PR).