jakartaee / data

Data-API
Apache License 2.0
87 stars 27 forks source link

[Use Case]: Annotations for use with Jakarta Persistence #470

Open gavinking opened 6 months ago

gavinking commented 6 months ago

As a ...

I need to be able to ...

write a portable repository backed by JPA.

Which enables me to ...

be more independent of the Jakarta Data provider.

Additional information

So I know we all said we didn't want to do this in this release. But with all the machinery we've defined, it's now frankly trivial.

We would add:

in new package jakarta.data.orm.

The semantics of the lifecycle annotations are completely defined by the JPA spec. Because they're specific to JPA/ORM, we have zero work to do, it's just a @see annotation that points to the corresponding operation of EntityManager.

It's similarly clear what @JPQL does. On the other hand, perhaps we don't need it and all we need to do is give people explicit permission to put their JPQL in the @Query annotation. (I can see arguments both ways.)

Finally, if I recall correctly, at some stage someone (Otavio I guess) had proposed a built-in JpaRepository. I don't argue that we need it, but if you all still want it, it's completely trivial to define in terms of the annotations listed above.

So why is this now so easy when previously it was "hard". Well, before, we were trying to shoehorn JPA into the existing @Save annotation or whatever. But now that our model of lifecycle annotations is extensible, it's no longer hard.

Dunno. I guess I feel like it's now become a bit hard to justify not doing it.

Reactions?

gavinking commented 6 months ago

Finally, if I recall correctly, at some stage someone (Otavio I guess) had proposed a built-in JpaRepository. I don't argue that we need it, but if you all still want it, it's completely trivial to define in terms of the annotations listed above.

See #143.

gavinking commented 6 months ago

Please see #477.

gavinking commented 6 months ago

Alright, so I guess I need to rewind a bit here and fully explain why I'm now proposing to do something we said we wouldn't do.

CRUD vs JPA

So, right from the start I've been pointing out the deep, fundamental differences in semantics between the semantics of JPA, and the semantics which were contemplated for simple repository lifecycle methods.

Let's start by considering a simple @Insert operation, vs JPA's persist(). Among other things:

  1. after persisting an entity, JPA doesn't just forget about it, but rather maintains an association between the entity and the persistence context,
  2. the actual insert statement on the database is not executed synchronously with the call to persist(), but rather when the entity manager is "flushed", typically at the end of the transaction, and
  3. operations performed via the EntityManager cascade to associated entities.

OK, a bit of a mismatch, but perhaps neither of those differences is a showstopper. Unfortunately it gets worse from there.

First, JPA literally has no update() method. The @Update operation has no natural mapping to JPA at all. Explicit updates are simply not part of the programming model of JPA. In particular, @Update is really not the same thing as merge(). The merge() method has pretty specific semantics which is why it's called "merge" and not "update". Actually, merge() is in some sense closer to @Save, but it's not that either.

Nor does JPA have a delete() operation which accepts a detached entity! To remove an entity in JPA, you first need to obtain a managed entity associated with the persistence context, by calling find(), and only then can you remove() it. This is very inefficient in the absence of a persistence context!

CRUD repositories for JPA entities

OK, so does that mean that we have to give up, and throw away the concept of a repository with @Insert, @Update, and @Delete methods which deal with JPA @Entity objects?

Absolutely not!

Hibernate can easily implement such a repository using its StatelessSession API. Our product already offers repositories like this. But they're not JPA repositories because the semantics of the operations are not the semantics of JPA. Or at least, not of JPA as it exists today. So we can't call them that.

So I understand that @njr-11 was holding out hope that perhaps one can "emulate" the semantics of @Insert, @Update, and @Delete by, I suppose, just flush()ing and clear()ing the entity manager at the end of each repository method. At first glance this doesn't look completely unreasonable, and it's a possibility I already contemplated several months ago.

But it's not the way JPA is intended to be used, and I would say it just doesn't really work out:

So I would never implement a repository this way.

Why is this even a problem?

The problem is that the current spec has explicit language in sections 6.4 and sections 7.1.1 which deal with JPA and say stuff about how a JPA-backed repository should work. In my opinion, these sections of the spec should not be defining how a Hibernate-backed repository might work. And that's exactly what this language was doing:

Repository operations must behave as though backed by a stateless Entity Manager in that persistence context is not preserved across the end of repository methods. All entities that are returned by repository methods must be in a detached state such that modifications to these entities are not persisted to the database unless the application explicitly invokes a Save or Update life cycle method for the entity.

The most natural way to read this is that it appears to be prescribing that a JPA-backed repository be implemented in exactly the way I just say I would "never implement a repository".

Worse, there's what I understand to be an actual TCK test for this functionality in ee.jakarta.tck.data.standalone.persistence.Catalog. So the quoted language is not only prescriptive, it's actually required by the TCK for compliance with Jakarta Data!

What to do about it

From my point of view, there's two possible fixes:

  1. Remove all prescriptive sections relating to Jakarta Persistence. This the easy option: simply delete sections 6.4 and 7.1.1 of the spec, and remove the package ee.jakarta.tck.data.standalone.persistence.Catalog from the TCK. This outcome is essentially consistent with what I've been advocating for months, and what I had understood we were going to do.
  2. But if that's not what people want to do, no problem at all, there's a perfectly good and easy alternative, which is described above, and implemented by #477. That is, we introduce the proposed lifecycle annotations corresponding to the basic operations of JPA, @Persist, @Merge, and freinds, and then fix the TCK test to make use of these annotations.

There's really not much work involved in option 2, and I really don't think anyone needs to be freaking out about it. I've already done the required work on the annotations and on the spec. And I can fix the TCK in a jiffy. But of course if ya'll prefer to do option 1, then I for one am totally fine with that.

Folks, I do understand and appreciate the desire to get something out, but it's much more important to make sure that what we get out is actually correct and not just wrong.

njr-11 commented 6 months ago

There are other ways to emulate stateless delete and update operations that are currently possible in Jakarta Persistence, such as issuing a JPQL DELETE or UPDATE. In some cases, such as if the user chooses an update method signature that returns the entity and it has autogenerated values, approaches like these may be suboptimal in advance of Jakarta Persistence adding the stateless entity manager, but I do think it should be possible to emulate stateless in advance of it. In the case of Hibernate, you already have stateless.

I do think it is important to have stateless as the default behavior of Jakarta Data. One of the primary aims of Jakarta Data is to have a simple and intuitive programming model for data access. While the stateful behavior of Jakarta Persistence is quite useful and can be more efficient, it is a level of complexity that basic users should not need to deal with/learn/think about unless they want it. The default behavior of Jakarta Data should be stateless, which is what we have been aiming for and written the spec/API for in version 1.0. It certainly makes sense for more advanced users/Jakarta Persistence users to be able to opt in to stateful, which has been in the plans for post version 1.0.

gavinking commented 6 months ago

I do think it is important to have stateless as the default behavior of Jakarta Data.

I agree. That's what I've been saying for approximately forever.

So then the path seems clear: I will send a pull request that deletes sections 6.4 and 7.1.1 of the specification.

And we will address JPA integration in a future release of Jakarta Data.