Closed FeldrinH closed 1 year ago
I just noticed that JqwikStreamSupport.concat
has a comment mentioning that using flatMap
breaks in "recursive scenarios, e.g. ArbitrariesRecursiveTests.ShrinkingTests.complexShrinking
", and this does appear to be true. With flatMap
ArbitrariesRecursiveTests.ShrinkingTests.complexShrinking
seems to get stuck shrinking indefinitely. However, I can't figure out why that is. What's the difference between flatMap
and concat
that causes flatMap
, but not concat
to break this test?
It's been a while that I went into the rabbit hole of this problem. What I think I remember is that the JDK (and this may depend on the exact version) is doing quite some optimization for flat mapping. One of this optimizations breaks the recursive use case - maybe because some state is being cached that shouldn't be.
As for the SOE: Does it show up independent of available memory?
Yes, it does show up regardless of available memory. I believe JVM stack size is fixed (the default is 1MB iirc), unless you manually configure your VM to increase stack size.
Assuming that the flatMap issue is too arcane to solve directly I propose something like this:
public static <T> Stream<T> concat(List<Stream<T>> streams) {
return concat(new ArrayList<>(streams), 0, streams.size());
}
private static <T> Stream<T> concat(List<Stream<T>> streams, int i, int j) {
if (j - i == 0) {
return Stream.empty();
}
if (j - i == 1) {
return streams.get(i);
}
int midpoint = (i + j) / 2;
return Stream.concat(concat(streams, i, midpoint), concat(streams, midpoint, j));
}
This splits the list approximately in half with every recursion, which should reduce recursion depth to approximately log2(n), which grows much slower than the current algorithm.
EDIT: Fixed bug with empty list not working.
For reference, there was a link clarifying the current implementation: https://github.com/jqwik-team/jqwik/issues/305#issuecomment-1041480252 I believe it makes sense to add the link to the sources
That linked article seems to recommend against the current implementation of JqwikStreamSupport.concat
because of the same stack overflow issue that I have encountered.
For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop
In other words, "balanced" implementation behaves badly with infinite streams, so it is not usable either.
The article recommends https://github.com/TechEmpower/misc/blob/master/concat/src/main/java/rnd/StreamConcatenation.java at the very end
It is a bit much to inline in a blog article, so take a look at StreamConcatenation.java for the source code. This implementation is similar to Stream.concat(a, b) in that it uses a custom Spliterator, except this implementation handles any number of input streams. It performs quite well. It does not outperform every other solution in every scenario (flatMap is generally better for very small input streams), but it never performs much worse and it scales nicely with the number and size of the input streams.
@jlink , WDYT of pulling StreamConcatenation.java?
It is not the first time users run into StackOverflowError triggered by concat
.
For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop
That quote is talking about flatMap, not the balanced approach.
I do agree that StreamConcatenation.java is probably the best option, provided that we trust the author of that article that it is actually correct.
Also fwiw I ran complexShrinking with the balanced approach and the test passed.
FYI, I have disabled shrinking because I am using stateful testing, and virtually all my operations modify state, so the shrinker does not work.
At the same time, when the shrinker was active it did fail with OOM and SOE from time to time. So improving concat seems helpful.
I wonder what the odds are for .concat
improvement to be accepted in OpenJDK.
WDYT of pulling StreamConcatenation.java?
Think I will do just that. License seems to allow it.
I wonder what the odds are for
.concat
improvement to be accepted in OpenJDK.
Have you ever tried to do a PR there? How's the process anyway?
Released in 1.8.2
I encountered a StackOverflowError while trying to minimize a failing input generated from the following arbitrary:
The relevant part of the stack trace is:
It seems like the problem should be fairly easy to fix by using
Stream.flatMap
or some other similar method inJqwikStreamSupport.concat
.