spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
546 stars 158 forks source link

Memory usage with Detector1 pipeline #8454

Open stscijgbot-jp opened 2 months ago

stscijgbot-jp commented 2 months ago

Issue JP-3610 was created on JIRA by Maria Pena-Guerrero:

Several Help Desk tickets have either directly stated or it has been found  through investigation that the Detector1 pipeline uses too much memory. 

Probably both repos the jwst and stdatamodels would have to be modified. In the stdatamodels repository, the culprit seems to be the open function, which when given a datamodel to open creates a copy or "clone" but the original is still referenced so the previous object cannot be garbage collected. Hence incrementing the number of copies and memory used in the different steps of detector1 on the jwst repo. For example, calling model2 = RampModel(model) (as is done at the beginning of detector1, and during charge_migration, etc) will create a new model ({}model2 is not model{}) however it will reference the same data ({}model2.data is model.data{}). Which links the primary memory load for both model and probably more importantly the underlying asdf objects are the same ({}model2._asdf is model._asdf{}). As the _asdf tree references effectively everything in the model their lifetimes are linked which may contribute to the growing memory load.

Here are some of the tickets in question (more may be added as they show up):

stscijgbot-jp commented 1 week ago

Comment by Maria Pena-Guerrero on JIRA:

In working to speed up the emicorr for large files I ended up writing a work around that reduces the running time and memory usage for the Detector 1 pipeline. I suggest we can put it in place while we get to work on the datamodels repo for the permanent fix. Here is the jwst branch: #8588