jakartaee / persistence

https://jakartaee.github.io/persistence/
Other
206 stars 59 forks source link

Query: limit value of startPosition param in setFirstResult because it's int #336

Closed joaofouyer closed 1 year ago

joaofouyer commented 2 years ago

The param startPosition of setFirstResult is of int type ref : https://github.com/eclipse-ee4j/jpa-api/blob/21f02c809fae624d49aba5cd0578a30cbfa775a0/api/src/main/java/jakarta/persistence/Query.java#L156 which limits the use of it when paginating a table with a large amount of rows. Is it viable to change it to Long type? Or maybe is it a workaround I can use to paginate a table with more than 2 billions or rows?

Thank you.

rbygrave commented 2 years ago

Or maybe is it a workaround I can use to paginate a table with more than 2 billions or rows?

Have you done this and it not completely hurt the database? (If so I am curious as to which database)

That is, firstRow/setFirstResult tends to result in the database doing work to skip those rows. When firstRow gets large the amount of work the database needs to do to skip those rows grows and gets large to the point of hurting us. Generally speaking skipping 2 billion rows hurts. So as I see it in general this is avoided and instead there are 2 paths to choose as workarounds:

  1. Design the UX of application to NOT allow deep paging at all. Instead use UX to encourage the user to refine their query, be more selective etc. That is, prevent deep paging at together.

  2. Use "continuous scrolling" where the application uses the unique values from the last row of the last page as predicates for the "next page". In this case there is no use of setFirstResult at all (no use of SQL OFFSET or SQL FIRST ROW in the actual sql) and instead we use predicates like for example last_name >= ? and first_name >= ? and id > ? (e.g. paging order by lastName, firstName, id and using all 3 to get a unique combination). We use these predicates to define the starting row of the "next page". This means the database doesn't have to do the work of skipping rows (we are effectively using the predicates to filter those rows out instead). This gives us the option of using an index on these predicates such that even very very deep scrolling is efficient for the database to implement.

Is that helpful?

gavinking commented 1 year ago

The response from @rbygrave seems pretty clear, and the original reporter has not responded, so I propose we close this.