jqwik-team / jqwik

Property-Based Testing on the JUnit Platform
http://jqwik.net
Eclipse Public License 2.0
576 stars 64 forks source link

StackOverflowError when shrinking large arrays #526

Closed FeldrinH closed 1 year ago

FeldrinH commented 1 year ago

I encountered a StackOverflowError while trying to minimize a failing input generated from the following arbitrary:

Arbitraries.longs().array(long[].class).ofMinSize(1).ofMaxSize(100_000)

The relevant part of the stack trace is:

    ...
Caused by: java.lang.StackOverflowError
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    at net.jqwik.engine.support.JqwikStreamSupport.concat(JqwikStreamSupport.java:69)
    ...

It seems like the problem should be fairly easy to fix by using Stream.flatMap or some other similar method in JqwikStreamSupport.concat.

FeldrinH commented 1 year ago

I just noticed that JqwikStreamSupport.concat has a comment mentioning that using flatMap breaks in "recursive scenarios, e.g. ArbitrariesRecursiveTests.ShrinkingTests.complexShrinking", and this does appear to be true. With flatMap ArbitrariesRecursiveTests.ShrinkingTests.complexShrinking seems to get stuck shrinking indefinitely. However, I can't figure out why that is. What's the difference between flatMap and concat that causes flatMap, but not concat to break this test?

jlink commented 1 year ago

It's been a while that I went into the rabbit hole of this problem. What I think I remember is that the JDK (and this may depend on the exact version) is doing quite some optimization for flat mapping. One of this optimizations breaks the recursive use case - maybe because some state is being cached that shouldn't be.

As for the SOE: Does it show up independent of available memory?

FeldrinH commented 1 year ago

Yes, it does show up regardless of available memory. I believe JVM stack size is fixed (the default is 1MB iirc), unless you manually configure your VM to increase stack size.

FeldrinH commented 1 year ago

Assuming that the flatMap issue is too arcane to solve directly I propose something like this:

    public static <T> Stream<T> concat(List<Stream<T>> streams) {
        return concat(new ArrayList<>(streams), 0, streams.size());
    }

    private static <T> Stream<T> concat(List<Stream<T>> streams, int i, int j) {
        if (j - i == 0) {
            return Stream.empty();
        }
        if (j - i == 1) {
            return streams.get(i);
        }
        int midpoint = (i + j) / 2;
        return Stream.concat(concat(streams, i, midpoint), concat(streams, midpoint, j));
    }

This splits the list approximately in half with every recursion, which should reduce recursion depth to approximately log2(n), which grows much slower than the current algorithm.

EDIT: Fixed bug with empty list not working.

vlsi commented 1 year ago

For reference, there was a link clarifying the current implementation: https://github.com/jqwik-team/jqwik/issues/305#issuecomment-1041480252 I believe it makes sense to add the link to the sources

FeldrinH commented 1 year ago

That linked article seems to recommend against the current implementation of JqwikStreamSupport.concat because of the same stack overflow issue that I have encountered.

vlsi commented 1 year ago

For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop

In other words, "balanced" implementation behaves badly with infinite streams, so it is not usable either.

The article recommends https://github.com/TechEmpower/misc/blob/master/concat/src/main/java/rnd/StreamConcatenation.java at the very end

It is a bit much to inline in a blog article, so take a look at StreamConcatenation.java for the source code. This implementation is similar to Stream.concat(a, b) in that it uses a custom Spliterator, except this implementation handles any number of input streams. It performs quite well. It does not outperform every other solution in every scenario (flatMap is generally better for very small input streams), but it never performs much worse and it scales nicely with the number and size of the input streams.

@jlink , WDYT of pulling StreamConcatenation.java? It is not the first time users run into StackOverflowError triggered by concat.

FeldrinH commented 1 year ago

For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop

That quote is talking about flatMap, not the balanced approach.

I do agree that StreamConcatenation.java is probably the best option, provided that we trust the author of that article that it is actually correct.

FeldrinH commented 1 year ago

Also fwiw I ran complexShrinking with the balanced approach and the test passed.

vlsi commented 1 year ago

FYI, I have disabled shrinking because I am using stateful testing, and virtually all my operations modify state, so the shrinker does not work.

At the same time, when the shrinker was active it did fail with OOM and SOE from time to time. So improving concat seems helpful.

I wonder what the odds are for .concat improvement to be accepted in OpenJDK.

jlink commented 1 year ago

WDYT of pulling StreamConcatenation.java?

Think I will do just that. License seems to allow it.

jlink commented 1 year ago

Fixed in https://github.com/jqwik-team/jqwik/commit/5a2a12a5e9ec8f6db81321625f0c0d261ae7e7cb

jlink commented 1 year ago

I wonder what the odds are for .concat improvement to be accepted in OpenJDK.

Have you ever tried to do a PR there? How's the process anyway?

jlink commented 1 year ago

Released in 1.8.2