Closed ToMe25 closed 2 months ago
My motivation for this was that I'm a perfectionist and get annoyed by this kind of thing, but I still did A LOT of benchmarking.
The result of my benchmarking was that while the changes to collecting the various iterators to a HashSet
were as expected, the changes to collecting to Vec
were kinda weird.
For some reason this change improved the performance of collecting an Itersection
to a Vec
, even though this change doesn't affect Intersection
at all.
This also drastically decreases the performance of collecting to a Vec
, which I believe to be solely due to the fact that the inlined lower bound is no longer a constant.
The summary of my results is this:
Collecting to a HashSet
gets faster by ~20% in the affected cases, and remains unchanged in the other cases.
Collecting to a Vec
gets faster or slower by ~20-50% seemingly at random, but overall slower on average.
That means that from a performance point of view, whether this change is worth it depends on whether collecting to a Vec
or a HashSet
is more common.
One way to potentially get rid of most of the collect::<Vec<T>>::()
performance penalty of this change would be to implement a specialized version for Vec
s.
I'm not entirely sure how trait based specialization works though, so I can't guarantee that this will work.
Another disadvantage would be that this would add a significant amount of extra code complexity.
The advantage is, though, that doing so should allow making collecting into a Vec
faster than without this PR, since the initial first allocation could be larger.
I'm not certain, however, if this outweighs the performance penalty that the trait based specialization probably brings with it.
Nor am I sure whether this would be worth the added complexity.
The overhead of a saturating add is tiny compared to memory allocation so that is most likely not the direct cause of the regression. Finding the actual cause would require looking at the exact assembly code generated by rustc and comparing the two versions along with profiling information.
However I think it is still better to merge this for now since it does provide a more accurate lower bound for the iterator size.
@bors r+
:pushpin: Commit f4361bcee301ca77469ad16e5d5b1436f4cb6934 has been approved by Amanieu
It is now in the queue for this repository.
:hourglass: Testing commit f4361bcee301ca77469ad16e5d5b1436f4cb6934 with merge 65c553ddd318e7b28cac0f8c922d64108b214ded...
:sunny: Test successful - checks-actions Approved by: Amanieu Pushing 65c553ddd318e7b28cac0f8c922d64108b214ded to master...
This PR makes the Set
Difference
iterator generate a non-zero lower bound in some situations.Specifically, it now returns a non-zero lower bound if the
difference
method is called on a larger set with a smaller set.That is because in those cases the fact that sets can't really contains duplicates* guarantees that a minimum of
self.len() - other.len()
items will be returned by the iterator.* Well, they can, but that is already documented to be causing a random mess
This implementation has the disadvantage that a single
next()
call may reduce the lower bound by more than one.Every size hint generated, even the first largest one, is guaranteed to be correct, but it may be confusing if one
next()
call causes the lower bound to drop by more than one.This could be avoided by storing the minimum number of resulting elements in the iterator and subtracting one each time
next()
is called, but I don't think its worth the added complexity.