Merger should only apply limits under certain conditions

julgus commented 1 year ago

Describe the bug JPAStreamer cannot optimize anonymous lambdas, i.e. f -> f.getLength() > 120. Hence the need for a JPAStreamer metamodel. This means if such an operation occurs early in the pipeline, the rest of the operations, regardless of whether they use the JPAStreamer metamodel or not, may also be executed in the JVM instead of the database.

Below is an example of a query that cannot be optimized at all since the length filter must be applied before the limit, otherwise the results will be different. Thus all JPAStreamer can do is issue a query that returns all Films in the DB and execute the anonymous lambda in the JVM. If the limit() did not exist, JPAStreamer could in fact reorder the operations:

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

As of now, JPAStreamer seems to overlook the lambda and reorders the operations in a way that distorts the results.

Expected behavior This should generate the following query as the lambda cannot be moved according to reordering rules:

Hibernate: 
    /* <criteria> */ select
        f1_0.film_id,
        f1_0.description,
        f1_0.language_id,
        f1_0.last_update,
        f1_0.length,
        f1_0.rating,
        f1_0.rental_duration,
        f1_0.rental_rate,
        f1_0.replacement_cost,
        f1_0.special_features,
        f1_0.title 
    from
        film f1_0 limit ?

Actual behavior

The generated query includes the second filter and the sorting operation. This means the first filter is applied in the JVM after the limit has been applied. Such reordering changes the result set.

Hibernate: 
    /* <criteria> */ select
        f1_0.film_id,
        f1_0.description,
        f1_0.language_id,
        f1_0.last_update,
        f1_0.length,
        f1_0.rating,
        f1_0.rental_duration,
        f1_0.rental_rate,
        f1_0.replacement_cost,
        f1_0.special_features,
        f1_0.title 
    from
        film f1_0 
    where
        f1_0.title like ? 
    order by
        f1_0.length asc limit ?

How To Reproduce The following integration test fails:

try (StreamSupplier<Film> supplier = jpaStreamer.createStreamSupplier(Film.class)) {

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

            final List<String> expected = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Comparator.comparing(Film::getLength))
                    .filter(f -> f.getTitle().startsWith("A"))
                    .map(Film::getTitle)
                    .collect(Collectors.toList());

            assertEquals(expected, actual);
}

Build tool e.g. Maven 3.9.0

JPAStreamer version At least JPAStreamer 3.0.0 and later

JPA Provider e.g. Hibernate 6.0.2.Final

Java Version e.g. Java 11.0.19

Context around operation ordering

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

Can safely be reordered as the filters are commutative and it doesn't matter if the sorting is done prior to filtering:

            final List<String> actual = supplier.stream()
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .filter(f -> f.getLength() > 120)
                    .map(Film$.title)
                    .collect(Collectors.toList());

In this case, the first sort and title predicate can be applied on the DB side, before applying the length filter in the JVM.

julgus commented 1 year ago

This problem seems to stem from several places. Firstly, the criteria merger should not continue merging operations after encountering an anonymous lambda. At its current state, it merges any operations that contain a JPAStreamer predicate, regardless of the ordering in the pipeline.

After fixing that, a similar test still fails:

        try (StreamSupplier<Film> supplier = jpaStreamer.createStreamSupplier(Film.class)) {
            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .limit(10)
                    .collect(Collectors.toList());

            final List<String> expected = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Comparator.comparing(Film::getLength))
                    .map(Film::getTitle)
                    .filter(title -> title.startsWith("A"))
                    .limit(10)
                    .collect(Collectors.toList());

            assertEquals(expected, actual);
        }

In this case, reordering rules allow the first Stream to be optimized as:

            final List<String> actual = supplier.stream()
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .filter(f -> f.getLength() > 120)
                    .map(Film$.title)
                    .limit(10)
                    .collect(Collectors.toList());

This means we can merge the sort and the filter on the title into the query without any issues. However, the limit is also applied on the DB side, limiting the results to 10 before the filter length > 120 is executed. Due to the sorting operation, the 10 shortest films are returned and none of them will match the criteria length > 120.

julgus commented 1 year ago

Another example is this test:

            final List<String> actual = supplier.stream()
                    .filter(Film$.title.startsWith("A"))
                    .skip(10)
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .limit(10)
                    .map(Film$.title)
                    .collect(Collectors.toList());

Here the SORTED is merged even though the SKIP precedes it. As SQL always applies SKIPS and LIMITS to the end of all other operations, we cannot merge this sort as it will change what entities are skipped.

speedment / jpa-streamer

Merger should only apply limits under certain conditions #354