spring-projects / spring-data-jpa

Simplifies the development of creating a JPA-based data access layer.
https://spring.io/projects/spring-data-jpa/
Apache License 2.0
2.99k stars 1.41k forks source link

SimpleJpaRepository.save() upsert logic is wrong #3135

Closed slawekludwiczak closed 1 year ago

slawekludwiczak commented 1 year ago

SimpleJpaRepository.save() method has wrong upsert logic and is valid only for generated ids.

save method looks like this:

@Transactional
public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Entity must not be null");
    if (this.entityInformation.isNew(entity)) {
        this.em.persist(entity);
        return entity;
    } else {
        return this.em.merge(entity);
    }
}

It checks whenever entity is new and if so, it invokes persist method, and if entity is not new, it invokes merge(). Checking if entity is new is done in JpaMetamodelEntityInformation and then delegated to AbstractEntityInformation.isNew():

public boolean isNew(T entity) {
    ID id = this.getId(entity);
    Class<ID> idType = this.getIdType();
    if (!idType.isPrimitive()) {
        return id == null;
    } else if (id instanceof Number) {
        return ((Number)id).longValue() == 0L;
    } else {
        throw new IllegalArgumentException(String.format("Unsupported primitive id type %s", idType));
    }
}

The problem is that this logic makes sense only if the id is generated. For primitive types there is also wrong assumption that entity with id = 0 is new. If we have entity like:

@Entity
public class ExampleEntity {
    @Id
    private long id;
    private String name;
    ...
}

and repository:

public interface ExampleEntityRepository extends CrudRepository<ExampleEntity, Long> {
}

When we save few instances of ExampleEntity we can see the inconsistency:

ExampleEntity entity0 = new ExampleEntity(0, "Entity0");
ExampleEntity entity1 = new ExampleEntity(1, "Entity1");
ExampleEntity entity2 = new ExampleEntity(2, "Entity2");
exampleEntityRepository.save(entity0); //persist
exampleEntityRepository.save(entity1); //merge
exampleEntityRepository.save(entity2); //merge
ExampleEntity entity0ToUpdate = exampleEntityRepository.findById(0L).orElseThrow();
entity0ToUpdate.setName("UpdatedEntity0");
//should merge, but trying to persist, thus throws JdbcSQLIntegrityConstraintViolationException
exampleEntityRepository.save(entity0ToUpdate);

There is unnecessary overhead of checking if entity is "new" or not. This verification is just fictitious. Since we cannot do any assumptions if the entity is new or not without checking persistence context, the benefit of this check is only for specific case with auto generated ids.

Possible solutions

1 - just merge

Instead of trying to check whenever entity is new or not, just use merge method and it will take care of that:

@Transactional
public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Entity must not be null");
    return this.em.merge(entity);
}

2 - verify novelty of entity only for entities with generated ids

...

Inconsistency with Jakarta Persistence spec

There is also a problem with documentation of EntityInformation.isNew(). It states that

Returns whether the given entity is considered to be new.

but according to jakarta persistence specification:

A new entity instance has no persistent identity, 
and is not yet associated with a persistence context.

So simple id verification is not sufficient.

repository with example: https://github.com/slawekludwiczak/spring-data-upsert

gregturn commented 1 year ago

I assure you that simply picking "merge" each and every time isn't the solution to working with JPA. In your situation, if you are using a primitive long instead of a boxed Long, and you really do need a primary key value of 0, then your best bet is to probably pull in either Spring Data JPA's AbstractPersistable abstract entity base type, or look at having your entity classes implement Spring Data Commons' Persistence interface. Either one of these will give you the ability to write your own isNew() method, subject to your own criteria (-1 being the "new" value you use??)

BTW, I'd like to highlight that when you use long (primitive) while using Long in your repository definition, as required by Java's generic parameter rules, you'd taking on a slight misalignment. I'd prefer having the ID parameter identically match the ID class's Java type. Using the boxed type Long in your entity class would also empower you to use 0 as a valid id since null will become the "new" type in that case.

Nevertheless, if you have need of 0 values in long types, then the suggestion suggested above should help you out.

slawekludwiczak commented 1 year ago

Primitives and 0 is just a part of the problem. For all cases, where id is not auto generated there is actually a call of merge() instead of persist() because the check return id == null; is also not sufficient since it has no knowledge about persistence context.

@Entity
public class ExampleUUID {
    @Id
    private UUID id;
    private String name;
...
ExampleUUID entity1 = new ExampleUUID(UUID.randomUUID(), "Entity1");
exampleUUIDRepository.save(entity1); //merge instead of persist

I believe it is pretty common to use keys generated on application side or even client side instead of relying on auto generated keys from data store (i.e. UUIDs in microservices and all "clean architectures" are common).

I assure you that simply picking "merge" each and every time isn't the solution to working with JPA.

In all cases where keys are not auto generated, spring data repositories currently behaves like this, so actually it is hard to understand why. I understand that there is overhead of additional select to the database, but at least it would be consistent for all scenarios, and it gives real information about entity state. Current solution is based on assumption that applications should use auto generated keys and for all other cases this.entityInformation.isNew(entity) is just overhead.

quaff commented 1 year ago

Merge entity with assigned id generator (which id not auto generated) will trigger extra select query, it harms performance.

gregturn commented 1 year ago

I understand that there is overhead of additional select to the database, but at least it would be consistent for all scenarios, and it gives real information about entity state.

We can't switch to a paradigm of forcing through all changes using merge because as @quaff mentioned, this will degrade performance significantly for a large number of users.

This is the reason you can leverage AbstractPersistable for entity types that need a different definition of isNew(). In your example,

ExampleUUID entity1 = new ExampleUUID(UUID.randomUUID(), "Entity1");
exampleUUIDRepository.save(entity1); //merge instead of persist

...where you are supplying a brand new, never-before-seen value of UUID, then yes, we can't see the context of when that is new vs. already existing, and based on checking for null, it looks like an already existing id and hence the usage of JPA's merge operation. That approach implies that if you're going to take back responsibility from the JPA provider for handling key generation, you need to also take back responsibility for deciding when an id is new. I mean your solution can be as easy as:

public interface AlwaysMergePersistable extends Persistable {

    default boolean isNew() {
        return false;
    }
}

Apply this to every entity type, and you'll ensure that everything goes through a merge. I don't think the overhead to invoke this hard-coded method will be much of anything, at least not when compared to the cost of Hibernate itself.

However, if you feel there is a bigger design here than your current application, one in which an alternative strategy should be designed where the framework switches to an alternative strategy, then please elaborate on how we could do this?

Additionally, and I don't know how vital this is to your situation or if you were simply tapping into UUID to make your point, but Hibernate does support UUID for id values. In fact Thorben Janssen has a detailed article about doing JPA auto-generation with UUID values => https://thorben-janssen.com/generate-uuids-primary-keys-hibernate/

slawekludwiczak commented 1 year ago

However, if you feel there is a bigger design here than your current application, one in which an alternative strategy should be designed where the framework switches to an alternative strategy, then please elaborate on how we could do this?

I've raised this issue not because this is a bug, but it is surprising / inconsistent that new entity for spring and for jpa are not the same. In all so-called clean/onion/hexagonal architectures where framework is treated as a detail and should be replaceable, the id is often generated in application layer and even if there is the database involved it looks more like this:

Long createNewExample(CreateExampleCommand cmd) {
    Long nextId = exampleRepository.nextId();
    Example newExample = Example.withId(nextId).settingOtherFields...
    //...
    exampleRepository.save(newExample);
    return nextId;

Since this is not a bug and is established over the years, I think the issue can be closed.