Closed julgus closed 1 year ago
This problem seems to stem from several places. Firstly, the criteria merger should not continue merging operations after encountering an anonymous lambda. At its current state, it merges any operations that contain a JPAStreamer predicate, regardless of the ordering in the pipeline.
After fixing that, a similar test still fails:
try (StreamSupplier<Film> supplier = jpaStreamer.createStreamSupplier(Film.class)) {
final List<String> actual = supplier.stream()
.filter(f -> f.getLength() > 120)
.sorted(Film$.length)
.filter(Film$.title.startsWith("A"))
.map(Film$.title)
.limit(10)
.collect(Collectors.toList());
final List<String> expected = supplier.stream()
.filter(f -> f.getLength() > 120)
.sorted(Comparator.comparing(Film::getLength))
.map(Film::getTitle)
.filter(title -> title.startsWith("A"))
.limit(10)
.collect(Collectors.toList());
assertEquals(expected, actual);
}
In this case, reordering rules allow the first Stream to be optimized as:
final List<String> actual = supplier.stream()
.sorted(Film$.length)
.filter(Film$.title.startsWith("A"))
.filter(f -> f.getLength() > 120)
.map(Film$.title)
.limit(10)
.collect(Collectors.toList());
This means we can merge the sort and the filter on the title into the query without any issues. However, the limit is also applied on the DB side, limiting the results to 10 before the filter length > 120 is executed. Due to the sorting operation, the 10 shortest films are returned and none of them will match the criteria length > 120.
Another example is this test:
final List<String> actual = supplier.stream()
.filter(Film$.title.startsWith("A"))
.skip(10)
.filter(f -> f.getLength() > 120)
.sorted(Film$.length)
.limit(10)
.map(Film$.title)
.collect(Collectors.toList());
Here the SORTED is merged even though the SKIP precedes it. As SQL always applies SKIPS and LIMITS to the end of all other operations, we cannot merge this sort as it will change what entities are skipped.
Describe the bug JPAStreamer cannot optimize anonymous lambdas, i.e.
f -> f.getLength() > 120
. Hence the need for a JPAStreamer metamodel. This means if such an operation occurs early in the pipeline, the rest of the operations, regardless of whether they use the JPAStreamer metamodel or not, may also be executed in the JVM instead of the database.Below is an example of a query that cannot be optimized at all since the length filter must be applied before the limit, otherwise the results will be different. Thus all JPAStreamer can do is issue a query that returns all Films in the DB and execute the anonymous lambda in the JVM. If the
limit()
did not exist, JPAStreamer could in fact reorder the operations:As of now, JPAStreamer seems to overlook the lambda and reorders the operations in a way that distorts the results.
Expected behavior This should generate the following query as the lambda cannot be moved according to reordering rules:
Actual behavior
The generated query includes the second filter and the sorting operation. This means the first filter is applied in the JVM after the limit has been applied. Such reordering changes the result set.
How To Reproduce The following integration test fails:
Build tool e.g. Maven 3.9.0
JPAStreamer version At least JPAStreamer 3.0.0 and later
JPA Provider e.g. Hibernate 6.0.2.Final
Java Version e.g. Java 11.0.19
Context around operation ordering
Can safely be reordered as the filters are commutative and it doesn't matter if the sorting is done prior to filtering:
In this case, the first sort and title predicate can be applied on the DB side, before applying the length filter in the JVM.