Open zanieb opened 2 months ago
Yes, union
is linear in the length of the ranges being merged so doing it in a loop is quadratic.
The fundamental union
algorithm is:
Some optimizations can be done with the sort because we know the inputs are themselves sorted. Similarly we can duplicate the list as we constructe it for the classic 2 argument case.
A union_many
could use a binary heap to find the next segment to process.
Well summarized!
It’s been a long time since I looked at this. I suppose a rather simple adaptation of the current code would just need to adjust the smaller_interval
selection among a list of segments iterators (one per range). Instead of just checking the "left" and "right", it would check every iterator head.
That would be better than repeatedly computing union, but still not optimal. An optimal one I guess would also remember how the iterator heads are sorted among themselves. Is that what you meant with the binary heap @Eh2406 ?
Yes. smaller_interval
is "among all non-exhausted iterators pick the first segment from the one who's got the smallest start point". If we had a IterWithOrdBySmallestStart
then we could calculate smaller_interval
by popping a BinaryHeap<IterWithOrdBySmallestStart>
. If the IterWithOrdBySmallestStart
is not exhausted then re-add what's left to the BinaryHeap
. If the binary keep is empty then you're done.
I find myself constructing a range in a loop with
union
, which we worry is not performant. It'd be nice to have aunion_many
that takes an iterable of ranges and unions them in a single efficient operation.Some example usage
https://github.com/astral-sh/uv/blob/0335efd0a901ee2351e663d0488b1ce87fd4cc80/crates/pep508-rs/src/marker/algebra.rs#L638-L645
https://github.com/astral-sh/uv/blob/e1a8beb64b9f6eb573b3af616ffff680c282a316/crates/uv-resolver/src/pubgrub/report.rs#L1098-L1102
I started to take a look at this myself, but realized I'm out of my depth :)