Improve Set Difference size_hint lower bound

ToMe25 commented 3 months ago

This PR makes the Set Difference iterator generate a non-zero lower bound in some situations.

Specifically, it now returns a non-zero lower bound if the difference method is called on a larger set with a smaller set.
That is because in those cases the fact that sets can't really contains duplicates* guarantees that a minimum of self.len() - other.len() items will be returned by the iterator.

* Well, they can, but that is already documented to be causing a random mess

This implementation has the disadvantage that a single next() call may reduce the lower bound by more than one.
Every size hint generated, even the first largest one, is guaranteed to be correct, but it may be confusing if one next() call causes the lower bound to drop by more than one.

This could be avoided by storing the minimum number of resulting elements in the iterator and subtracting one each time next() is called, but I don't think its worth the added complexity.

ToMe25 commented 3 months ago

My motivation for this was that I'm a perfectionist and get annoyed by this kind of thing, but I still did A LOT of benchmarking.

The result of my benchmarking was that while the changes to collecting the various iterators to a HashSet were as expected, the changes to collecting to Vec were kinda weird.
For some reason this change improved the performance of collecting an Itersection to a Vec, even though this change doesn't affect Intersection at all.
This also drastically decreases the performance of collecting to a Vec, which I believe to be solely due to the fact that the inlined lower bound is no longer a constant.

The summary of my results is this:
Collecting to a HashSet gets faster by ~20% in the affected cases, and remains unchanged in the other cases.
Collecting to a Vec gets faster or slower by ~20-50% seemingly at random, but overall slower on average.

Here my exact results(click me):

Name | Before | After | Diff (%) -- | -- | -- | -- set_iter_collect_difference_large_small_hashset | 55,917.77 | 44,172.59 | -21.00% set_iter_collect_difference_large_small_vec | 22,996.61 | 34,235.16 | 48.87% set_iter_collect_difference_small_large_hashset | 2,785.54 | 2,737.30 | -1.73% set_iter_collect_difference_small_large_vec | 2,071.05 | 1,598.02 | -22.84% set_iter_collect_intersection_large_small_hashset | 2,862.06 | 2,860.77 | -0.05% set_iter_collect_intersection_large_small_vec | 2,403.75 | 1,630.63 | -32.16% set_iter_collect_intersection_small_large_hashset | 2,866.35 | 2,885.31 | 0.66% set_iter_collect_intersection_small_large_vec | 2,490.41 | 1,623.99 | -34.79% set_iter_collect_symmetric_difference_large_small_hashset | 60,269.34 | 47,027.25 | -21.97% set_iter_collect_symmetric_difference_large_small_vec | 24,543.87 | 34,968.29 | 42.47% set_iter_collect_symmetric_difference_small_large_hashset | 60,668.83 | 45,848.87 | -24.43% set_iter_collect_symmetric_difference_small_large_vec | 24,864.28 | 34,386.12 | 38.30% set_iter_collect_union_large_small_hashset | 42,177.86 | 42,743.26 | 1.34% set_iter_collect_union_large_small_vec | 31,568.72 | 30,323.12 | -3.95% set_iter_collect_union_small_large_hashset | 42,909.41 | 42,364.49 | -1.27% set_iter_collect_union_small_large_vec | 25,763.16 | 30,302.66 | 17.62% The tests were run with a "large" 1000 element set and a "small" 100 element set with an overlap of 50 elements. The `large_small`/`small_large` portion of the name indicates which set the operation was called on. `large_small` translates to `large.$op(small)`.

That means that from a performance point of view, whether this change is worth it depends on whether collecting to a Vec or a HashSet is more common.

ToMe25 commented 2 months ago

One way to potentially get rid of most of the collect::<Vec<T>>::() performance penalty of this change would be to implement a specialized version for Vecs.

I'm not entirely sure how trait based specialization works though, so I can't guarantee that this will work.
Another disadvantage would be that this would add a significant amount of extra code complexity.

The advantage is, though, that doing so should allow making collecting into a Vec faster than without this PR, since the initial first allocation could be larger.
I'm not certain, however, if this outweighs the performance penalty that the trait based specialization probably brings with it.
Nor am I sure whether this would be worth the added complexity.

Amanieu commented 2 months ago

The overhead of a saturating add is tiny compared to memory allocation so that is most likely not the direct cause of the regression. Finding the actual cause would require looking at the exact assembly code generated by rustc and comparing the two versions along with profiling information.

However I think it is still better to merge this for now since it does provide a more accurate lower bound for the iterator size.

@bors r+

bors commented 2 months ago

:pushpin: Commit f4361bcee301ca77469ad16e5d5b1436f4cb6934 has been approved by Amanieu

It is now in the queue for this repository.

bors commented 2 months ago

:hourglass: Testing commit f4361bcee301ca77469ad16e5d5b1436f4cb6934 with merge 65c553ddd318e7b28cac0f8c922d64108b214ded...

bors commented 2 months ago

:sunny: Test successful - checks-actions Approved by: Amanieu Pushing 65c553ddd318e7b28cac0f8c922d64108b214ded to master...

rust-lang / hashbrown

Improve Set Difference size_hint lower bound #530