Open xosmig opened 8 years ago
CC @gereeter
Am I missing something? I don't see append() on BTreeSet
@alfiedotwtf what's wrong with this one? https://doc.rust-lang.org/std/collections/struct.BTreeSet.html#method.append
Gah! I don't know how I managed to land on old documentation:
https://doc.rust-lang.org/1.6.0/std/collections/struct.BTreeSet.html
Thanks @xosmig :)
I would love to see this fixed! I have recently developed a data structure [1], which is supposed to be faster than BTreeSet for a specific usage pattern, but I can't write a fair benchmark until BTreeSet supports a O(k + log n)
delete-range operation.
[1] https://users.rust-lang.org/t/announcing-teardowntree/8259
I think perhaps the current append
implementation is a little problematic. Arguably, one would expect a "batch" operation like set.append(other_set)
to be generally the better choice when you have to insert one set into the other, because presumably it is able to take into consideration certain optimizations that can not be made when inserting one item at a time. However, the complexity O(n + m)
means that if, say, n
is large and m
is small, the cost of calling append
is enormous.
I ran into this problem today, where for a relatively large computational problem, in which I called append
roughly 500k times on with n ~ 100k
and m ~ 10
, my program would not finish for half an hour. After changing to insert
in a loop, the program finished in 2 seconds.
There are two problems here, as I see it:
append
sounds like a straightforward batch operation, so at least I personally wouldn't expect it to have such a severe drawback compared to repeated insertion for unbalanced set sizes.append
itself.Of course, ideally append
would work well in all cases, but deciding when to pick each strategy is surely not a simple problem...
cc @ssomers (due to their many improvements to BTreeMap/BTreeSet in the last couple of years)
Rest assured, I was subscribed... It's not hard to imrpove this in many cases, as Andlon proved, but hard to prove it doesn't become worse in some other situations. A lot of benchmarking to do.
I was bitten by this in the context of purposefully trying to merge a smaller BTreeSet
into a larger one[^1].
Neither the function name nor the documentation helped to diagnose the issue. I think that improving the documentation for append
would already help significantly.
[^1]: "Small to Large" (😅) optimization: https://soi.ch/wiki/smaller-to-larger/
When the largest key of one tree is less than the smallest key of another tree, there is a way to merge trees using only O(log n) operations. Maybe logarithmic
merge
should be a separate method. I'm ready to work on it.UPD: Also sometimes it may be better to push elements of one tree to another one with O(min_length log(max_length)) operations instead of rebuild with O(n + m) operations (like in #32526).
But the problem is that it's hard to determine which approach is better in a particular case.