Different number of elements in paged result of non-distinct query

bielitom commented 1 year ago

By default, all the results of the Repository query are returned as the distinct list. When pagination and joins are used the count of elements differs from the actually obtained results. In some cases it can totally break a pagination, mostly unpredictably.

This JPQL query: SELECT a FROM Author a INNER JOIN Book b ON b.author = a will return list of the unique Author entities independently from the number of the books per author (assuming, that author have non-zero number of books). On the other hand, the native query will return the list as should be, with the duplicates caused by the join.

This behavior basically does not cause the problem, but when it comes to the pagination, it have some consequences. Page.getTotalElements() method shows different number than returned number of the entities.

The problem looks like to be caused by the condition at org/springframework/data/support/PageableExecutionUtils.java:66 The condition works with the actual results and if the result list is smaller than pageSize it will not run countQuery supplier. This causes different getTotalElements() number on the pages where content.size()<getPageSize(). This can happen on every page because algorithm wrongly thinks, that it is the last page. On the affected pages the number of totalElements suddenly drops because it is calculated from the actual offset, not using the count query.

To get the acceptable results:

It should run the countQuery every time (you will get the wrong result, but the pagination will remain unchanged on all pages).
It should detect if the query is JPQL query and using joins. If yes, the distinct keyword in the countQuery should be used (to not to change the actual behavior). Native queries must be unaffected.

How to reproduce

Entities

@Entity Getter @Setter
public class Author {
    @Id @GeneratedValue(strategy= GenerationType.AUTO) private Long id;
    private String name;
    @OneToMany(mappedBy = "author") private List<Book> books = new ArrayList<>();

    public Author() {}
    public Author(String name) {
        this.name = name;
    }
}

@Entity Getter @Setter
public class Book {
    @Id @GeneratedValue(strategy= GenerationType.AUTO) private Long id;
    private String name;
    @ManyToOne(cascade = CascadeType.ALL) private Author author;

    public Book() { }
    public Book(String name, Author author) {
        this.name = name;
        this.author = author;
    }
}

Repository

public interface AuthorRepository extends JpaRepository<Author, Long> {
    @Query("SELECT a FROM Author a INNER JOIN Book b ON b.author = a")
    Page<Author> findAllWithBooks(Pageable pageable);

    @Query(value = "SELECT a.* FROM author a INNER JOIN book b ON b.author_id = a.id", nativeQuery = true)
    Page<Author> findAllWithBooksNative(Pageable pageable);

    @Query(value = "SELECT DISTINCT a.* FROM author a INNER JOIN book b ON b.author_id = a.id", nativeQuery = true)
    Page<Author> findAllWithBooksNativeDistinct(Pageable pageable);

    @Query("SELECT DISTINCT a FROM Author a INNER JOIN Book b ON b.author = a")
    Page<Author> findAllWithBooksDistinct(Pageable pageable);
}

Test Code

Author a;
a = authorRepository.save(new Author("Orson Scott Card"));
bookRepository.save(new Book("Ender's Game", a));

a = authorRepository.save(new Author("Douglas Adams"));
bookRepository.save(new Book("The Hitchhiker's Guide to the Galaxy", a));
bookRepository.save(new Book("The Restaurant at the End of the Universe", a));

a = authorRepository.save(new Author("Frank Herbert"));
bookRepository.save(new Book("Dune", a));
bookRepository.save(new Book("Destination: Void", a));

Assertions.assertEquals(3, authorRepository.count());
Assertions.assertEquals(5, bookRepository.count());

//Here we have 5 results presented as total results as expected, because of non-distinct select with the join
Assertions.assertEquals(5, authorRepository.findAllWithBooks(PageRequest.of(0, 1)).getTotalElements());

//If the page request is bigger than the actual number of Author entities we get getContent().size() == getTotalElements()
Assertions.assertEquals(3, authorRepository.findAllWithBooks(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooks(PageRequest.of(0, 100)).getTotalElements());

//When we use distinct keyword in JPQL query (or in the Specification), the problem does not occur
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 100)).getTotalElements());

//Using the native query everything works as expected - non-distinct
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 100)).getTotalElements());

//Using the native query everything works as expected - distinct
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 100)).getTotalElements());

mp911de commented 1 year ago

For this case, you must provide your own count query via @Query(countQuery = …). Spring Data attempts to derive a count query on a best effort basis. Because both, JPQL and SQL allow statements that affect the multiplicity, it is your code that needs to provide a count query if you provided a base query yielding non-distinct results.

As the examples above do not define a countQuery the natural consequence is that you yield invalid results.

bielitom commented 1 year ago

@mp911de I didn't mention @Query(countQuery = …) because it may be problematic to maintain with longer queries or using JPA Specification (same results, not included in the example).

When it comes to the getCountQuery method in /org/springframework/data/jpa/repository/support/SimpleJpaRepository.java:776 it can correctly recognize distinct and non-distinct query. But as mentioned before, the entities returned from the query itself are always unique when JPQL is used.

Anyway, the problem is not really in the count query alone. The problem lies in the getPage method in PageableExecutionUtils.java, where the algorithm always thinks, that it is on the last page and totally ignores countQuery. It happens when the query does not return the full page of the results, because JPA naturally removes duplicates from the results.

mp911de commented 1 year ago

because it may be problematic to maintain with longer queries

In such a case, the implementation should go in a custom implementation. Repositories and @Query have their limits, these aren't all purpose utilities.

Similar to

where the algorithm always thinks, that it is on the last page

We have certain assumptions that we meet. If your queries do not meet these assumptions, then you need to provide either more specific queries or follow the route of implementing a custom fragment.

That being said, we do not intend to introduce more complexity on our side.

spring-projects / spring-data-jpa

Different number of elements in paged result of non-distinct query #3220

How to reproduce