vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.78k stars 196 forks source link

Spliterator and iterator use different ordering for sets #292

Closed pburka closed 1 year ago

pburka commented 1 year ago

With the introduction of optimized spliterators in 8.5.x, the streaming order diverged from the iteration order for unordered collections. I don't think this violates any specification or documentation, but it could have some unintended consequences and I wanted to confirm that the divergence is intentional.

For example (on 8.5.12):

jshell> import it.unimi.dsi.fastutil.ints.*

jshell> var s = new IntOpenHashSet(new int[] { 0, 1, 2, 3 })
s ==> {0, 1, 3, 2}

jshell> s.toIntArray()
$4 ==> int[4] { 0, 1, 3, 2 }

jshell> s.toArray()
$5 ==> Object[4] { 0, 1, 3, 2 }

jshell> s.intStream().toArray()
$6 ==> int[4] { 0, 2, 3, 1 }

jshell> s.stream().toArray()
$7 ==> Object[4] { 0, 2, 3, 1 }

In version, 8.4.4 these arrays all have the same order. But in 8.5. the order is reversed for streams (except for 0, the null value).

I don't think this is strictly wrong, but I do think it could be surprising. For example, a user might build one array with toIntArray() and another with intStream().map(...).toArray(), incorrectly expecting that the mapped values correspond to the values at the same indices in the first array.

vigna commented 1 year ago

It's not intentional, but the contributor who wrote the SplitIterator implementation decided to enumerate by scanning the backing map forward (which makes sense, in particular when you split), whereas SetIterator always iterated backward (a pet peeve of mine a few years ago). In principle one might rewrite SetIterator with a forward logic, but really it makes no sense for a super-stable, super-tested class. Consider that the streams can be split and used in parallel, so you shouldn't really have any expectation about the iteration order.