spring-projects / spring-data-jpa

Simplifies the development of creating a JPA-based data access layer.
https://spring.io/projects/spring-data-jpa/
Apache License 2.0
3.01k stars 1.42k forks source link

KeysetScrollSpecification null properties support #3605

Open ivan-zaitsev opened 2 months ago

ivan-zaitsev commented 2 months ago

It is documented here that keyset-filtering requires all the keyset properties to be non-nullable.

But will it actually affect indexing/ordering by adding cb.isNull() condition for some of the properties to https://github.com/spring-projects/spring-data-jpa/blob/main/spring-data-jpa/src/main/java/org/springframework/data/jpa/repository/query/KeysetScrollSpecification.java#L120 ?

Generated query examples:

Map<String, Object> keys = new LinkedHashMap<>();
keys.put("createdAt", entity.getCreatedAt());
keys.put("id", entity.getId());

// Maybe somehow here allow to specify which keys can be null, but at least one key is requred to be not null
KeysetScrollPosition scrollPosition = ScrollPosition.of(keys, ScrollPosition.Direction.FORWARD);

Sort sort = Sort.by(Sort.Direction.ASC, "createdAt", "id")

Function<FluentQuery.FetchableFluentQuery<Entity>, Window<Entity>> queryFunction = query -> query
        .limit(10)
        .sortBy(sort)
        .scroll(scrollPosition);

Specification<Entity> specification = ...;

Window<Entity> result = repository.findBy(specification, queryFunction);
SELECT *
FROM table
WHERE (created_at IS NOT NULL AND created_at > ?) OR
      (created_at IS NULL AND id > ?)  -- use a fallback key like `id` for rows with NULL `created_at`
ORDER BY created_at ASC, id ASC
LIMIT 10;
mp911de commented 1 month ago

Not quite sure what you're asking for. The general issue is that various databases may have support for null precedence in ordering while others don't support null precedence. Also, JPA doesn't support the specification of null precedence. This is scheduled for the next JPA spec release.

It would help if you come up with a bit more of an example, how you envision things should work so that we can find out whether this ticket can become actionable.

ivan-zaitsev commented 1 month ago

@mp911de

I am talking about WHERE condition for keyset-filtering. Currently it is generated without IS NULL conditions.

Not having IS NULL conditions in WHERE for null properties usually leads to database not returning the records in result set (So the ordering ORDER BY is not even applied, because WHERE condition filters nulls firstly).

IS NULL conditions should solve problem with database not returning the records for null properties by having only one required property which should be not null, in this case it is id.

WHERE (created_at IS NOT NULL AND created_at > ?) OR (created_at IS NULL AND id > ?)

My proposal is to allow null keys for creating KeysetScrollPosition. But only one key is required to be not null.

To preserve backwards compatibility forScrollPosition.of(keys, ScrollPosition.Direction.FORWARD) each key can be specified if it should be nullable or not. For example:

class KeysetKey {
  String name;
  Object value;
  boolean nullable;  // nullable flag is not for `value` field, but rather database field
}

List<KeysetKey> keys = new LinkedList<>();
keys.add(KeysetKey.ofNullable("createdAt", entity.getCreatedAt()));
keys.add(KeysetKey.of("id", entity.getId()));

KeysetScrollPosition scrollPosition = ScrollPosition.of(keys, ScrollPosition.Direction.FORWARD);

It should produce such query:

SELECT * FROM table WHERE (created_at IS NOT NULL AND created_at > ?) OR (created_at IS NULL AND id > ?) ORDER BY created_at ASC, id ASC LIMIT 10;

schauder commented 1 month ago

If we support NULL values we need to support NULLS FIRST/LAST semantics.

This could actually be achieved with adding an additional virtual field to the where clause generation which is 0 for non-null values and 1 or -1 for null values. But that is probably really bad for the resulting explain plan. I assume the same can be achieved using different operators (<, <=, >=, >). And your query might actually be one of those solutions.

Still I'm afraid this might be bad for performance, so I think we should find a way to only apply this to fields which are nullable.

IF we get all this working, I don't think we need a "fallback" we only need the combinations of all fields in the key set to be unique.

ivan-zaitsev commented 1 month ago

If we support NULL values we need to support NULLS FIRST/LAST semantics.

Of course. It is another issue which is related to ORDER BY clause which comes after WHERE conditions, because WHERE conditions alone would not solve issue completely. I believe there is another task for ORDER BY in github issues.

Still I'm afraid this might be bad for performance, so I think we should find a way to only apply this to fields which are nullable.

IF we get all this working, I don't think we need a "fallback" we only need the combinations of all fields in the key set to be unique.

Yes, agree. The implementation code should be in control to make field nullable or not, maybe only by default all fields should not be nullabe. But at least one field should be not null to make pagination work.

schauder commented 1 month ago

Of course. It is another issue which is related to ORDER BY clause which comes after WHERE conditions, because WHERE conditions alone would not solve issue completely. I believe there is another task for ORDER BY in github issues.

The where clause is actually all about ordering. After all, we are trying to select the next n elements after the current one. So we need to consider if null comes before the first null elements, or after in the where clause.

quaff commented 1 month ago
Map<String, Object> keys = new LinkedHashMap<>();
keys.put("createdAt", entity.getCreatedAt());
keys.put("id", entity.getId());

@ivan-zaitsev FYI, id property is not required since Spring Data JPA will add it implicitly to make sure uniqueness, also I'm proposing to not add id if provided properties combination are unique, see #3013.