Closed t-brandt closed 5 months ago
A proposed fix that is far from maximally efficient but is, to me, much easier to understand:
This seems to work for every test case I can throw at it, and the logic seems simple and solid. It should behave exactly as the current routine but with a bit more efficiency and (to me) much more readability.
def _coalesce_bounds(limits):
x = np.sort(np.asarray(limits), axis=1)
lower = []
upper = []
# To be a lower bound of a subinterval, there must be no
# interval with a *lower* lower bound *and* an upper bound
# at least as high as this candidate lower bound.
for i in np.unique(x[:, 0]): # np.unique sorts for us
if not np.any((i <= x[:, 1])&(i > x[:, 0])):
lower += [i]
# To be an upper bound of a subinterval, there must be no
# interval with a *higher* upper bound *and* a lower bound
# at least as low as this candidate higher bound.
for i in np.unique(x[:, 1]):
if not np.any((i >= x[:, 0])&(i < x[:, 1])):
upper += [i]
if len(upper) != len(lower):
raise ValueError("I don't know how this could happen")
# No need to do any more work than this because arrays are sorted
return [[lower[i], upper[i]] for i in range(len(upper))]
extract1d._coalesce_bounds does not appear to behave as intended when the first segment contains the other ones, for example, if the input is [[1, 100], [5, 50], [3, 60]] the output will be [[1, 60]] and for input of [[1, 100], [5, 27], [6, 28], [40, 50]] the output will be [[1, 28], [40, 50]] The routine is also very opaque and inefficient in using lists for everything; we should consider rewriting it much more simply with arrays.