vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.76k stars 196 forks source link

Fix toXWithExpected size overallocating in parallel stream. #242

Closed techsy730 closed 3 years ago

techsy730 commented 3 years ago

Fix toXWithExpected size overallocating in parallel stream.

This is accomplished by having the Supplier preallocate a Collection of full expected size for the first collection allocation in the first thread, 1/2 expected size for the second, 1/3 for the third, etc.

Fixes #211

This reduces the minimum memory allocated when using this in a parallel stream from Ω(threads * expectedSize) -> Ω(threads * H_expectedSize) = Ω(threads * ln(expectedSize)). (The worst case memory usage is still O(threads * expectedSize), but that requires some truly pathological splitting decisions by the backing Spliterator, and is a problem the non-sizing logic would have too)

Where Ω is the min asymptotically bounding function (Think O notation but for the min instead of max) threads is the number of threads the Stream will use expectedSize is the value of the expectedSize parameter given to toXWithExpectedSize and H_n is the nth harmonic number, 1 + 1/2 + 1/3 + ... + 1/n For a more familiar complexity comparison, Θ(H_n) = Θ(ln(n)) (Θ is like O notation but bounded above and below) as the difference between H_n and ln(n) approaches a constant as n->∞ (the Euler–Mascheroni constant for those interested)

...What are you talking about, I'm not just flexing math knowledge. <_<   >_>

It's certainly not perfect. The 4nd thread isn't assured to be the 4th to allocate for example. And not all Spliterators take a split-by-two approach. But it shouldn't be any worse in performance then not providing a size estimate at all (unless you radically overestimate)

techsy730 commented 3 years ago

Ok, I think I am done for now. Core logic seems ok, only Javadoc changes may be possible (but no need to block PR over that)

vigna commented 3 years ago

Where Ω is the min asymptotically bounding function (Think O notation but for the min instead of max)

Wow, never heard of it. 😂