spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
570 stars 167 forks source link

Bug in extract1d._coalesce_bounds #8563

Closed t-brandt closed 5 months ago

t-brandt commented 5 months ago

extract1d._coalesce_bounds does not appear to behave as intended when the first segment contains the other ones, for example, if the input is [[1, 100], [5, 50], [3, 60]] the output will be [[1, 60]] and for input of [[1, 100], [5, 27], [6, 28], [40, 50]] the output will be [[1, 28], [40, 50]] The routine is also very opaque and inefficient in using lists for everything; we should consider rewriting it much more simply with arrays.

t-brandt commented 5 months ago

A proposed fix that is far from maximally efficient but is, to me, much easier to understand:

This seems to work for every test case I can throw at it, and the logic seems simple and solid. It should behave exactly as the current routine but with a bit more efficiency and (to me) much more readability.

def _coalesce_bounds(limits):
    x = np.sort(np.asarray(limits), axis=1)
    lower = []
    upper = []

    # To be a lower bound of a subinterval, there must be no 
    # interval with a *lower* lower bound *and* an upper bound
    # at least as high as this candidate lower bound.

    for i in np.unique(x[:, 0]): # np.unique sorts for us
        if not np.any((i <= x[:, 1])&(i > x[:, 0])):
            lower += [i]

    # To be an upper bound of a subinterval, there must be no 
    # interval with a *higher* upper bound *and* a lower bound
    # at least as low as this candidate higher bound.

    for i in np.unique(x[:, 1]):
        if not np.any((i >= x[:, 0])&(i < x[:, 1])):
            upper += [i]

    if len(upper) != len(lower):
        raise ValueError("I don't know how this could happen")

    # No need to do any more work than this because arrays are sorted
    return [[lower[i], upper[i]] for i in range(len(upper))]