stscijgbot-jp commented 3 weeks ago

Issue JP-3657 was created on JIRA by Christian Hayes:

The spec2 runtimes for NIRSpec BOTS and MOS can be relatively long because they have many separate integrations and objects to reduce, respectively. Example runtimes below have been taken from two relatively typical BOTS and MOS observations.

PID 2512 obs 17 (BOTS): one rateints segment file (out of eight) takes ~28 minutes to run through spec2. Essentially all of that time is spent in extract_1d.

PID 2561 obs 2 (MOS): each rate file (in a group of 12) takes ~4 minutes to run in spec2. The longest steps for this processing are extract_2d (89s) and resample_spec (58s). These do process much faster than the long times for BOTS, but we wanted to include a note about MOS because the time adds up when there are multiple nods and observations.

We would be interested in exploring whether there is room for improvement in the runtime for these steps, or if it might be possible to implement parallel processing for the slower steps that loop through individual objects/integrations.

David Law Timothy Brandt, could these steps also be impacted by the the discussion about the use of deepcopy for NIRSpec in the assign_wcs module in https://jira.stsci.edu/browse/JP-3564?

stscijgbot-jp commented 3 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes, it sounds like you are not applying NSClean here? It seems like NSClean operates in subarray mode under BOTS, which is very expensive because of how the linear algebra is set up. Assuming you are not applying NSClean then I think it is just from many, many calls to extract1d. Within this, it looks to me like all of the time is going into _extract_src_flux within a big loop. I can take a look at this; I suspect that this loop can be implemented a lot more efficiently.

stscijgbot-jp commented 3 weeks ago

Comment by Christian Hayes on JIRA:

Timothy Brandt that's correct, these numbers were for the baseline spec2 settings (where NSClean is off by default), and all but a few seconds is spent looping through extract1d for all of the BOTS integrations.

stscijgbot-jp commented 3 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes, the MOS routines may be sped up by a fix to the deepcopy problem. I have a proposal for that that I will discuss with David Law and Nadia Dencheva next Friday. I will look at _extract_src_flux in the meantime.

stscijgbot-jp commented 3 weeks ago

Comment by Timothy Brandt on JIRA:

A question for Christian Hayes: supposing that extract_1d can be sped up by a factor of at least 10 (from a quick glance at the code, I suspect that it can with a modest amount of effort). Do you have a sense of what volume of data this would represent or how high of a priority it should be? And there is a background calculation within that same loop. If backgrounds are typically fit in this mode, then this part needs to be optimized in addition to _extract_src_flux.

Also, regarding the MOS observations, I will share a fork of jwst in the coming days/week that proposes a fix for the deepcopy issue, removing 95% of the unneeded effort. I would appreciate you trying out this fork on your MOS data to (a) confirm that it doesn't break anything and (b) tell me whether it improves performance, and if so, by how much.

stscijgbot-jp commented 3 weeks ago

Comment by Christian Hayes on JIRA:

Timothy Brandt thanks for taking a look. For extract_1d, I would say this is a moderate priority. In principle this would affect all of the NIRSpec data/modes (not just BOTS), but the runtime for this step is currently only a limiting factor for BOTS, which is probably about 1/3 of NIRSpec data (at least by observing time). Backgrounds are not currently fit in this step (at least with the baseline pipeline processing), but that is one of the options we want to explore.

That sounds good about the deepcopy issue. I'm happy to give that a try once it's ready.

stscijgbot-jp commented 3 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes, try this branch:

https://github.com/t-brandt/jwst/tree/nirspec_deepcopy_fix

There is a deepcopy in extract_2d that I didn't get. I'm curious what speedups you see in various modes. I don't expect much in BOTS, but I think it should help significantly in the MOS and IFU modes. And of course, please tell me if anything breaks.

stscijgbot-jp commented 3 weeks ago

Comment by Timothy Brandt on JIRA:

David Law Christian Hayes would STScI consider an implementation with numba? It could offer a very large (factor >10) improvement in this case if the extract source flux routine is restructured such that numba can interpret it.

stscijgbot-jp commented 2 weeks ago

Comment by Timothy Brandt on JIRA:

David Law Christian Hayes Ok, an update. An implementation of _extract_src_flux that is not sufficiently general for all cases but does handle the case in an example BOTS notebook and is numba compatible runs about 50 times faster than the current extract1d implementation once I add the @⁣jit decorator. I expect this factor of 50 to hold, more or less, in the general case, with the following caveats:

For the background, rather than models.Polynomial, for example, I would probably have to call a numpy routine: polyfit, linalg.solve, linalg.lstsq, etc. to implement a polynomial fit.

The weights option is currently passed as a function. I don't think this would play nicely with a numba-compatible implementation. Options that would play nicely: (a) a weight array, computed once and used for all integrations, or (b) parameters for a Gaussian weight template rather than a general weight function. Also, with weights, to do something like optimal extraction you would take sum(data*weights)/sum(weights**2), which is pretty different from what is in there now.

From my understanding, BOTS generally does not use a weighted extraction, so point (2) may not be such a big deal. I may be able to achieve a similar speedup without numba if numba is a nonstarter.

Given the python/pipeline overheads I am currently seeing in each call to extract1d, I suspect that the speedup you would get from a numba implementation is a factor of ~3-4 in the data set you referenced as the cost of _extract_src_flux becomes negligible. That may be reason enough to pursue what I just mentioned. Another possible reason is that I would guess that background fitting would make things a lot more costly, and if everything is numba compatible, I would expect that python/pipeline overheads would dominate and that the additional cost of background fitting would be negligible.

If this is a priority for the mission/team, I can prototype something in 1-2 days of solid work (figure a week of actual time with other things going on). Please let me know.

stscijgbot-jp commented 2 weeks ago

Comment by Timothy Brandt on JIRA:

David Law Christian Hayes, I will amend my previous statement: I am confident that I can get this implemented effectively in pure numpy (no numba), and with a functional call to a weight as a function of lambda and y intact, so long as the weighting function of lambda and y can take 2D inputs for both lambda and y. If the latter can't be done there will be an efficiency hit, but the rest can still be done. My caveat (1), about the fitting to the background, stands (I would re-implement this). Let me know whenever if you would like me to proceed.

stscijgbot-jp commented 2 weeks ago

Comment by Christian Hayes on JIRA:

Thanks again for taking a look at this Timothy Brandt. For the deepcopy fix I tested some MOS and IFU data. MOS improved a little bit (~20s faster), but IFU data ran significantly faster through spec2 (times reduced from ~192s to ~76s), and from a couple spot checks it didn't look like there were any significant changes to the resulting data for either.

Regarding the questions about changes to extract_1d, that might need a higher level discussion, but David Law may be able to say more

stscijgbot-jp commented 2 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes Thank you, and a quick question: would the capability to fit a two-dimensional background (e.g. a 2D polynomial as a function of position along and perpendicular to the dispersion direction) be useful? It is not possible given the current implementation of extract1d but would be straightforward under an alternate implementation.

stscijgbot-jp commented 2 weeks ago

Comment by Christian Hayes on JIRA:

Timothy Brandt I don't think that would be heavily used, at least for NIRSpec. It's possible that there could be some cases where it is useful to fit a 2D background (though the wavelength dependence of some backgrounds won't be easily fit by a polynomial), but there are several options for background subtraction before the extraction that are commonly used.

stscijgbot-jp commented 2 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes David Law Ok, so I couldn't help myself. I implemented a new version (not with a background fit yet, but that will be straightforward). It's pure python/numpy, has fewer lines of code than the current implementation, is (to my admittedly biased eye) nice and clean, and it runs about 50 times faster. My biggest issue with the background is that I disagree with the calculation currently in place, so I would want to discuss that before doing anything.

stscijgbot-jp commented 2 weeks ago

Comment by Timothy Brandt on JIRA:

Christian Hayes could you see if this branch works (i.e. runs with no errors and gives identical results in the x1d files up to the last decimal place)? It did in my tests, but I would love another set of eyes. And if it passes that test, could you tell me how much faster that 28 minute BOTS reduction goes?

https://github.com/t-brandt/jwst/tree/extract1d_numpy_speedup

stscijgbot-jp commented 2 weeks ago

Comment by Christian Hayes on JIRA:

Timothy Brandt thanks for putting this together, I'll take a look next week and let you know what I find.

stscijgbot-jp commented 1 week ago

Comment by Timothy Brandt on JIRA:

Ok David Law Christian Hayes Melanie Clarke, I am breaking this down into two things. First, almost all of the time was actually going into creating 2D interpolating splines for use in the aperture correction. These were created for each wavelength and for each integration (and they were all ones anyway). As a first step, I have modified the aperture correction in extract1d to support the re-use of an aperture correction in integrations after the first one. It is reused only if the RA, Dec, and wavelength are the same as the previous integration, and a previous integration successfully tabulated the results. This saves something like a factor of 10 in BOTS execution time. There is another factor of probably 2-3 to be had from rewriting extract1d (and more than that if a background is fit); I will work on that next. The addition of support for reusing an aperture correction object is in this PR: #8609

stscijgbot-jp commented 2 days ago

Comment by David Law on JIRA:

I defer to Melanie Clarke on the detailed code review, but on a quick review I think #8609 looks entirely reasonable.

It's not changing any algorithms or approaches, just caching computations so that they don't need to be re-derived thousands of times.

In the BOTS example that I tested (jw04098002001_04102_00001-seg001_nrs1) the current pipeline code took about 1 second per integration to do the spectral extraction, resulting in a total spec2 runtime of 25 minutes. With this PR the results were identical, but the runtime of spec2 was 4 minutes instead for a factor of 5x gain in speed.

stscijgbot-jp commented 2 days ago

Comment by Christian Hayes on JIRA:

Thanks Timothy Brandt, and David Law for taking a look. Apologies for not updating the ticket after discussing with Tim offline, but I came to similar conclusions as you running on the data set listed in the ticket (PID 2512 obs 17). This ran substantially faster and the results were identical for me as well.

stscijgbot-jp commented 2 days ago

Comment by Melanie Clarke on JIRA:

I have also tested these changes offline, with similar results.

Timothy Brandt - let me know when the PR is ready for review and I'll check out the code details!

stscijgbot-jp commented 2 days ago

Comment by Timothy Brandt on JIRA:

Melanie Clarke I think it's ready for review. The biggest thing I was unsure of was whether I should add an optional argument to apcorr.apply() to use the tabulated correction, or whether I should make a new method (I did the latter but could change it easily enough). If I did change it there would need to be some checks within the routine. Beyond that question, I think it is ready for you to take a look.

stscijgbot-jp commented 22 hours ago

Comment by Melanie Clarke on JIRA:

Okay, thanks. I'll review.

spacetelescope / jwst

Improve NIRSpec spec2 runtime for BOTS/MOS #8559

For the background, rather than models.Polynomial, for example, I would probably have to call a numpy routine: polyfit, linalg.solve, linalg.lstsq, etc. to implement a polynomial fit.