extract1d._coalesce_bounds can return unexpected results

spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope

https://jwst-pipeline.readthedocs.io/en/latest/

Other

570 stars 167 forks source link

extract1d._coalesce_bounds can return unexpected results #8564

Closed stscijgbot-jp closed 4 months ago

stscijgbot-jp commented 5 months ago

Issue JP-3660 was created on JIRA by Timothy Brandt:

extract1d._coalesce_bounds does not appear to behave as intended when the first segment contains the other ones, for example, if the input is [[1, 100], [5, 50], [3, 60]] the output will be [[1, 60]] where I am nearly certain that [[1, 100]] is intended, and for input of [[1, 100], [5, 27], [6, 28], [40, 50]] the output will be [[1, 28], [40, 50]] where again I am nearly certain that [[1, 100]] is the intended output. This issue seems to be present whenever any interval is contained within another. The routine is also opaque (to my eyes at least) and inefficient in using lists for everything; we could consider rewriting it more simply with arrays. See the attached Jupyter notebook for more examples of unexpected behavior and a proposed rewrite.

stscijgbot-jp commented 5 months ago

Comment by Timothy Brandt on JIRA:

6/16/24 @ 15:30: I updated the accompanying notebook with a few more example sets of limits, deleted the duplicate issue on GitHub, and made a slight update to the issue. It looks like if any set of limits is entirely contained within another set, the results may be incorrect. I do not know if results can be incorrect more generally. It is a bit difficult for me to follow the logic of the original routine, but it seems like it may assume that both the lower and the upper limits of the sequence of intervals are both in ascending order after sorting by the lower limits only, which is equivalent to assuming that no interval is entirely contained within any other interval. If this assumption ever breaks in the JWST pipeline then incorrect results, and incorrect extracted spectra, may follow.

stscijgbot-jp commented 5 months ago

Comment by Timothy Brandt on JIRA:

After a little more thought, I think that replacing

https://github.com/spacetelescope/jwst/blob/a0cdad8b9ce9b9e3835bb1893ace205f40623798/jwst/extract_1d/extract1d.py#L805],

with

pint[1] = max(interval[1], pint[1])

will fix the issue. That should avoid any concerns about breaking anything else, as it just removes an unacknowledged assumption from the function that could result in surprising behavior that would be hard to track down. David Law, I know you are on break for the next few days, but if this sounds good I will create a pull request to close the issue.

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Timothy Brandt Please do create a PR.

It would be useful to hear from Howard Bushouse what the history of this routine is. It looks like a clear bug, but since I'm not sure offhand what kind of segments this routine typically gets called with it would be good to get a better picture of this before changing it.

stscijgbot-jp commented 5 months ago

Comment by Timothy Brandt on JIRA:

David Law I created the pull request. My sense is that this function is usually called with only one or two sets of limits, either for the extraction region for a slit (one set; two numbers) or for a background region (probably two sets; four numbers). In both cases I think this bug would be unlikely to cause problems, so my expectation is that this bug has never actually impacted data reductions. These limits are computed in the 100 or so lines following

https://github.com/spacetelescope/jwst/blob/10501f22def0460b15df65cd5261601c6a71d7cf/jwst/extract_1d/extract1d.py#L124

from polynomial functions defining slit boundaries and background regions, which I assume are defined and maintained by the instrument teams. It would be great for Howard Bushouse to weigh in.

stscijgbot-jp commented 5 months ago

Comment by Howard Bushouse on JIRA:

David Law Timothy Brandt Honestly I have no idea as to why the current code was written that way - most likely because the author never envisioned anyone using multiple regions that were contained within one another.

stscijgbot-jp commented 4 months ago

Comment by Howard Bushouse on JIRA:

Fixed by #8586

stscijgbot-jp commented 2 months ago

Comment by David Law on JIRA:

Working reasonably for me in tests, closing.