spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Implement a ModelLibrary subclass for jwst #8649

Closed stscijgbot-jp closed 1 week ago

stscijgbot-jp commented 2 months ago

Issue JP-3690 was created on JIRA by Brett Graham:

Once the ModelLibrary container class is available in stpipe, implement a subclass (and tests) for jwst. The current target will be to update steps in calwebb image3 to use the new container class.

The main goal of the new container class is to provide memory-efficient mode for processing large associations which might not always be preferred (small associations are likely more efficiently processed by loading the entire association in memory). The container provides an "on_disk" setting to control if models are saved in memory or "on disk" and it may make sense to expose this setting in the pipeline (and likely in all steps that support the library).

To achieve the above goal it will be necessary that steps don't load all models from the library which might involve updating some step code. Once the scope of these step updates is determined additional tickets might be opened or the work included in this ticket.

stscijgbot-jp commented 3 weeks ago

Comment by Brett Graham on JIRA:

I added 2 attachments:

From a run of a 972 member association (~100GB input data) through calwebb_image3 using #8683 (an "on disk" library was used as input and a slightly older commit 26e5436). The pipeline succeeded and the recorded memory usage (using memray) was as shown in the attachments. The peak memory usage was 50GB and this is largely due to the context array generated during resample. Importantly for this PR, at no point does the pipeline load all input data into memory.

 

The association and data was shared with us for https://jira.stsci.edu/browse/JP-3498 and indicate that that ticket can also be closed when the linked PR is merged.