Long runtime and memory errors for mrs_imatch

stscijgbot-jp commented 2 years ago

Issue JP-2607 was created on JIRA by David Law:

As reported in https://jira.stsci.edu/browse/DMSOPS-492, the mrs_imatch pipeline step can have problems when passed too many input files. When attempting to process 17 observations (17x4 = 68 exposures) from JW1024 it ran for 60 hours and eventually ran out of memory.

I've been testing this offline by attempting to combine progressively larger quantities of data. While cube_build can handle all 17 observations and produces a composite data cube in a reasonable amount of time, the mrs_imatch step seems to have a runtime that grows roughly quadratically with the number of exposures. Attached plot illustrates runtimes measured up to 7 observations (28 exposures), with a model extrapolation to the 1000 minute runtime expected for 68 exposures.

Presumably the memory demands similarly grow rapidly, which is why the process eventually died rather than simply taking a long time.

Filing this ticket to log this as a known issue, and to see if it can be worked at some point in time. Priority is given as mir_med as it does potentially have some impact on programs that execute large mosaics, but priority during commissioning is low.

stscijgbot-jp commented 2 years ago

Comment by Mihai Cara on JIRA:

That is correct, the algorithm that guarantees the most optimal matching of the background across all images is quadratic in nature and requires computation of N * (N - 1) / 2 differences between different pairs of images (or cubes). The memory should not grow quadratically but should be something like 5-6x the memory used for inputs, I believe (I have not looked at this code for 5 years and if I was wrong, I will update here). Also, unlike the algorithm used in 'skymatch', which also does optimal matching, 'mrs_imatch' performs clipping iterations and so this increases run time as well.

How large was the output grid (size of resampled cube) for those 17x4=68 input cubes?

stscijgbot-jp commented 2 years ago

Comment by Mihai Cara on JIRA:

Current default value of 'nclip' is 5. So, by default there are 5 N (N - 1) / 2 computations of weighed means (or medians) image differences. Depending on the image size and polynomial order, this could be quite a lot. For example, for 68 input cubes, it would need to compute 11,390 differences between cubes + other computations (such as solving the system, evaluating the model, ...). Disabling clipping should result in a 5x speedup.

Q: What are the ideas on improving the algorithm in order to speed it up?

Memory usage should be linear a(poly_order)*N + b and it should depend on the polynomial order too. In a manner similar to #6874 it could be possible to minimize the number of numpy arrays loaded simultaneously into the memory at any given time. Maybe this could improve speed if the slowdown is caused by frequent memory swapping.

stscijgbot-jp commented 2 years ago

Comment by Mihai Cara on JIRA:

Howard Bushouse and David Law I do not have access to /ifs/... Could you, please, put the data that cause memory crash to a location that I could access?

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

David Law It's been a while since any of us have looked at this issue. I notice that the param ref files in CRDS have the mrs_imatch step set to be skipped during normal DMS processing. So I'm wondering whether it's worth the time and effort to continue to look at this. How much, if any, use does this step get by folks doing off-line reprocessing of MRS data?

stscijgbot-jp commented 1 year ago

Comment by David Law on JIRA:

Howard Bushouse I'd say that this ticket isn't worth putting effort into right now; too much has changed since (so it may no longer be relevant) and we aren't seeing any need to use the step in any science cases that I'm aware of (the IFU field is just too tiny). I don't want to withdraw it entirely as it may still prove necessary once we start digging further into improving our background subtraction, but for the time being I'll move it to a miri low priority.

stscijgbot-jp commented 1 year ago

Comment by Howard Bushouse on JIRA:

David Law Thanks for the info. We'll consider it as being in a state of "On Hold" until we hear otherwise.

stscijgbot-jp commented 10 months ago

Comment by David Law on JIRA:

Given this ticket still does not appear to be relevant any longer, I've set it to withdrawn for bookkeeping. We can always reopen later if it proves necessary.

spacetelescope / jwst

Long runtime and memory errors for mrs_imatch #7082