stateless entity manager

jakartaee / persistence

https://jakartaee.github.io/persistence/

Other

187 stars 55 forks source link

stateless entity manager #374

Open gavinking opened 2 years ago

gavinking commented 2 years ago

A feature that Hibernate has had for a very long time, but which has never made it into the JPA spec is StatelessSession.

https://docs.jboss.org/hibernate/orm/6.1/javadocs/org/hibernate/StatelessSession.html

The idea of a stateless session is to allow the program to directly perform persistence operations without going via a persistence context, much like what you would get with handcoded SQL:

For some users, this is a slightly more intuitive way to work, but
the primary motivation is that it relieves the program of the need to explicitly clear() the persistence context when performing operations which affect a large number of entity instances.

Now, the flipside is that the semantics differ in significant ways to the semantics of a regular EntityManager:

Naturally, there's no first-level cache, but also
the second-level cache is bypassed, since for the most common usecase we don't want to go filing the cache up with data we're probably not going to use again soon.
There's no transactional write-behind, nor automatic dirty checking, since these behaviors depend on having a persistence context.
On the downside, there's no way to update a collection (except for an unowned @OneToMany), since it's hard to come up with an efficient way to implement that without a persistence context for the common usecase.
Lazy loading is an explicit operation.
Finally, and most significantly: with no dirty checking, updating an entity is also always an explicit operation.

To emphasize these semantic differences, the persistence operations are given distinct names:

insert() (not persist())
delete() (not remove())
update()
get() (not find())

(Though refresh() is still called refresh().)

There are two additional (minor) limitations, though they're not fundamental to the idea, and could in principle be removed: there's no cascading of persistence operations, and there's no persistence callback events. These features don't seem terribly useful for stateless sessions but nor would they be difficult to implement.

I think that it's well-worth discussing whether a stateless entity manager API should be introduced in the JPA spec.

rbygrave commented 2 years ago

As I see it, all the JPA implementations are capable (note 4) of providing what I used to call back in the day a "Sessionless API" (or "Sessionless ORM (3)", there are a few ORMs in Java and other languages that work this way). The "Sessionless API" has transactions (begin, commit, rollback etc) + insert, update, delete, flush, refresh, query.

Relative to JPA, users of this API only need to manage the scope of transactions, that is they generally do not need to manage the scope of EntityManager and Transactions.

In JPA terms, if we say that the dominant use of JPA is "transaction scoped persistence context" where both EntityManager and Transaction pretty much have the same scope then you might get a feel for how these "Sessionless ORMs" generally work which is that the Transaction internally has a "persistence context" (2).

Lets say we have: SessionLessJPA - approximately JPA EntityManagerFactory Tranaction - approximately JPA Transaction but with an associated L1 cache / "persistence context"

And we use it like:

try (Transaction  txn = sessionLessJPA.beginTransaction()) {

  sessionLessJPA.insert(<some new entity bean>);

  List<CustomerEntity> customers = sessionLessJPA.query(CustomerEntity.class)
    .where() ...
    .findList();

  sessionLessJPA.update(<some fetched entity bean>);

  txn.commit();
}

So the API is arguably simpler because we only need to manage the scope of transactions [somewhat because the scope of the L1 cache/persistence context matches the transaction (1)]. To me, it looks like the JPA API was somewhat designed around EntityManager being managed by a "container" (using @PersistenceContext) or said differently, it would be nice to have a more Java SE friendly API.

The other thing that "Sessionless ORMs" do is remove the need for managing the entity bean lifecycle. Effectively they do this by (A) having the old values/dirty values stored on the entity beans themselves (B) having the JDBC batch mechanism also scoped to the transaction. This means that persisting methods like insert(<a new bean>) can be batched etc, persist cascading works etc. This does mean that the transaction has extra methods like flush().

The result is that the persist methods insert, update, delete don't actually need to interact with the "persistence context" because they have their own dirty state with the caveat that delete()s need to remove entries from the "persistence context".

Disclaimer:

I'm totally biased to "Sessionless ORM" because that is the type of ORM I maintain and use. I think a few of you probably know that.

Notes:

(1) Note that technically we could detach the "persistence context" from the transaction and similar attach a "persistence context" to a transaction along the lines of extended persistence context - but in practice that will be very rarely used. (2) "persistence context" in this sessionless approach is largely only used for de-duplication in graph building select queries - but delete() also calls "persistence context" to remove entries that have been deleted. (3) I now think "Sessionless ORM" is misleading because they do have a "persistence context" for graph building. It just happens to be somewhat transparent to the API because it is largely transaction scoped / an internal detail of the transaction. This means the concept of a session/unit of work is more a hidden internal detail that could be exposed but extremely rarely. (4) Eclipselink, Hibernate and DataNucleus all provide enhancement options which support holding the dirty state & old values on the entity beans themselves. Doing this removes the need to manage the lifecycle of entity beans, allows cascading persistence etc. Well imo there are lots of benefits to this approach including performance.

gavinking commented 2 years ago

@rbygrave So that sounds quite similar to what I described, except for one thing.

In what I'm proposing, there's never a need to call flush(), since the operations are executed synchronously. And therefore a "stateless session" is truly stateless. The downside of this is that batching requires an explicit operation insertAll(), etc. But actually this "direct control" is something I consider desirable here.

In your proposal it seems that there's still transactional state held in the session since operations are queued until flush() is called. This allows you to make batching transparent, but at the cost of some loss of direct control by the user.

Now, given that my "stateless sessions" are truly stateless, the need to maintain an association with a transaction is alleviated. There's actually no strong reason why the session even needs to know about the transaction (and vice-versa). There could even be just one sateless entity manager object in the whole program. (Hell, in principle you could even stick all those operations on EntityManagerFactory, I suppose, though that's not what I'm proposing.)

In what you're describing, there's still a hard link between the lifecycle of sessions and transactions. The sessions are not really truly stateless, even though they don't have a persistence context.

rbygrave commented 2 years ago

Kind of. Hibernate StatelessSession as I see it has a lot of limitations and maybe that is what you are really going for. What I'm more suggesting is about is having almost ALL the functionality of a normal JPA usage but with some adjustments like:

We don't see the EntityManager/Session in the API (instead as an internal detail the persistence context is attached and scoped to the transaction)
The app code gets to decide if there is JDBC batch or not on a transaction basis and global default. If there is no jdbc batch there is nothing queued to flush() per say.
With jdbc batch on, flush() doesn't traverse the session for dirty beans but instead has an explicitly defined list based on what was explicitly sent to insert(), update(), delete(). More details below.
When I say "persistence context" here I mean only the part that is used to de-duplicate instances during graph building. This doesn't use the "persistence context" as part of persisting.

there's never a need to call flush()

We can turn on or off the use of JDBC batch per transaction (and a global default) and specify per transaction a JDBC batch size to use so the app code can decide and has full control. In my API there is actually a insertAll() etc too and there is also control over batching on cascade (e.g. inserting an order with many lines, cascade inserting lines defaults to use batch)

"direct control" is something I consider desirable here

Absolutely agree. Hence there is also the case for per transaction turning off GetGeneratedKeys, turning off cascading persist, controlling flush on query behaviour - it gives the app code exact control. If for example we turn off batch and turn off cascading persist we'd get the behaviour of the original proposal / StatelessSession.

transactional state held in the session since operations are queued until flush() is called.

Yes if JDBC batch is used. flush() is called or some batch size is hit or flushOnQuery is on and a query is executed etc. Pretty much JPA except we have the ability per transaction to control JDBC batch, cascade persist etc.

This allows you to make batching transparent, but at the cost of some loss of direct control by the user.

App code gets to control global defaults and control these things per transaction. I'd argue there is no loss of control and as you allude to this level of control isn't available to us via JPA today.

in principle you could even stick all those operations on EntityManagerFactory

Well yes exactly except we wouldn't call it EntityManagerFactory ... hence my comment: SessionLessJPA - approximately JPA EntityManagerFactory.

session even needs to know about the transaction (and vice-versa)

Well maybe I'm missing the point here. In this API I'm talking about there are "Transactions" and there is the "SessionLessJPA - approximately JPA EntityManagerFactory" thing (a single global instance like EntityManagerFactory).

When we go sessionLessJPA.insert(<some new entity bean>); ... then that could be executed inside a transaction or not (and if not I'd suggest we would want to create a transaction for that insert and commit it). The persisting methods insert(), update(), delete() care about the transaction, the batch mode, the cascading behaviour. They don't care about the "persistence context". They get their dirty state/old values from the beans themselves.

When we go:

  List<CustomerEntity> customers = sessionLessJPA.query(CustomerEntity.class)
    .where() ...
    .findList();

... then if that is running in a transaction (which has an internal persistence context/L1 cache attached to it) then when we execute that query it will look to use the persistence context that is attached to that surrounding transaction.

The sessions are not really truly stateless, even though they don't have a persistence context

Hmmm, we might be hitting a terminology issue given there is "JPA Persistence context" , "L1 cache" and "Hibernate Session" ... maybe what I'm suggesting is that it might be less useful to talk about Sessions per say because in my mind they do too many things (Unique instances as part of graph building/L1 cache, dirty state, persisting actions like flush). What I'm suggesting is to break that up and the only "thing" that is scoped with the transaction is the "persistence context" but only the part used for de-duplication as part of graph building, it's not part of persisting at all - in Hibernate terms I think that is the "L1 Cache".

... but I suspect I could be explaining this rather badly.

gavinking commented 2 years ago

We don't see the EntityManager/Session in the API

Right, you put the operations on your equivalent of EntityManagerFactory, which, as I said above, is always an option.

But it's an orthogonal question to the question of what are the semantics of these operations, and a question I'm not especially interested in initially. This isn't, it seems to me, an important difference.

The app code gets to decide if there is JDBC batch or not on a transaction basis and global default. If there is no jdbc batch there is nothing queued to flush() per say.

Sure, but if you do have batching, then there is state implicitly or explicitly associated with the transaction. That's clear, because you have defined a flush() operation.

In what I'm describing, there's never any such state, and there's no flush() operation.

With jdbc batch on, flush() doesn't traverse the session for dirty beans but instead has an explicitly defined list based on what was explicitly sent to insert(), update(), delete(). More details below.

I understand that. This is the same in Hibernate.

When I say "persistence context" here I mean only the part that is used to de-duplicate instances during graph building. This doesn't use the "persistence context" as part of persisting.

I understand that. This is the same in Hibernate.

In my API there is actually a insertAll() etc too

Right, so what I'm arguing is that for this sort of API that's all you need. There's some value in having insert() and insertAll() perform their work synchronously.

If you want transactional write-behind and transparent batching, you go full EntityManager and you get that and more. But if you want something more "bare-metal", then here's an API for that.

Absolutely agree. Hence there is also the case for per transaction turning off GetGeneratedKeys, turning off cascading persist, controlling flush on query behaviour - it gives the app code exact control. If for example we turn off batch and turn off cascading persist we'd get the behaviour of the original proposal / StatelessSession.

Right, that's what I'm trying to get at here. We already have an API with all the fancy bells and whistles and lots of implicit behavior, and it works great. But sometimes some people like more direct control and that, IMO, calls for a separate API, because the semantics are naturally going to be quite a lot different to EntityManager.

Indeed, the whole programming model is quite different. In JPA there's no explicit update() operation, for example.

then if that is running in a transaction (which has an internal persistence context/L1 cache attached to it) then when we execute that query it will look to use the persistence context that is attached to that surrounding transaction.

Well wait, now I'm confused. Perhaps I'm misreading, but that doesn't sound sessionless at all. That sounds like you do have a persistence context, just one that is transparently associated to the transaction. That's not what this proposal is about at all. A StatelessSession in Hibernate always throws away its persistence context at the end of each query, even when there is a transaction.

The motivation, again, is:

to enable you to write code which processes many entity instances in a single tx without having to explicitly manage (periodically flush() and clear()) the persistence context, and
reproduce the semantics you would have if you handcoded your persistence logic using JDBC.

Hmmm, we might be hitting a terminology issue given there is "JPA Persistence context" , "L1 cache" and "Hibernate Session" ... maybe what I'm suggesting is that it might be less useful to talk about Sessions per say because in my mind they do too many things (Unique instances as part of graph building/L1 cache, dirty state, persisting actions like flush). What I'm suggesting is to break that up and the only "thing" that is scoped with the transaction is the "persistence context" but only the part used for de-duplication as part of graph building, it's not part of persisting at all - in Hibernate terms I think that is the "L1 Cache".

I mean the way these terms are using in JPA (and Hibernate) is:

first-level cache = persistence context (= session with a lowercase "s" in Hibernate)
EntityManager = an API (= Session in Hibernate)

One thing is a hashmap full of entity instances, and the other thing is an API with operations like persist(). Now, in JPA (and Hibernate) today, every instance of EntityManager (Session) has a persistence context (session).

But this 1-to-1 relationship doesn't really apply to StatelessSession, which simply doesn't have a persistence context. Or, perhaps more strictly speaking, it has many mini-persistence contexts that are created and destroyed to service individual queries.

gavinking commented 2 years ago

Sure, but if you do have batching, then there is state implicitly or explicitly associated with the transaction. That's clear, because you have defined a flush() operation.

In what I'm describing, there's never any such state, and there's no flush() operation.

To clarify a minor point, since perhaps I left the wrong impression here. In a Hibernate classic StatelessSession today, with JDBC batching enabled, batched operations are queued at the JDBC level until the current transaction commits. So in fact "stateless" sessions are something slightly less than stateless.

But that's not the model in Hibernate Reactive, and it's not the model I'm proposing here.

rbygrave commented 2 years ago

Perhaps I'm misreading, but that doesn't sound sessionless at all. That sounds like you do have a persistence context, just one that is transparently associated to the transaction.

That's it, there is a persistence context that is transparently associated to the transaction. Its an API that does not have a EntityManager (or "Unit of Work" or "Hibernate Session big S"). "Sessionless ORM" is a terrible (and old) term and I'm dropping it. It might be better called "ORMs that transparently manage their persistence context" but the TLDR effect in terms of API is that there is no EntityManager and no entity bean lifecycle.

Or, perhaps more strictly speaking, it has many mini-persistence contexts that are created and destroyed to service individual queries

In my terminology this is "query scoped persistence context" (with the other scopes being "extended scope" and "transaction scope"). This is a option available on the query, for any given query we can choose for it to use "query scoped persistence context" and yes this has proven to be very useful for people.

We already have an API with all the fancy bells and whistles and lots of implicit behavior, and it works great. But sometimes some people like more direct control

Yes noting that today JPA does not give the application code control on a per transaction basis over use of use of jdbc batch, batch size, GetGeneratedKeys, and cascading of persist. JPA also uses "transparent persistence" meaning attached dirty entities are included as an update implicitly as opposed to explicit use of an update() method ("explicit persistence").

StatelessSession as suggested would address a decent amount or all of this, agreed.

What I'm suggesting (and apologies because maybe it's confusing the issue) is that there is a more radical option that would achieve the same benefits in terms of control but I'd argue it does so without the proposed limitations of StatelessSession.

motivation

I believe what you'd suggest is that we don't need anything more than StatelessSession and that is fair enough. I just wanted to put out there this more radical option exists which can achieve the same goals. As you put it - "the whole programming model is quite different" so I felt is was an option worth discussing.

What this more radical approach allows is a programming model where there is literally only transactions to scope (using try with resources). It is arguably orientated to be easier to use stand alone without a JEE container or Spring because there is no EntityManager to manage. I believe this approach is in reach of all the JPA vendors if they desire it.

... but it is more radical approach and maybe there isn't a lot of familiarity with it.

gavinking commented 2 years ago

TLDR effect in terms of API is that there is no EntityManager

I mean, I think this at least slightly overstates the difference. You still have an interface with explicit persistence operations, it's just effectively a singleton. And, if I understand correctly, you still have some sort of persistence context. (You need it, I believe, in order to avoid data aliasing.) I think what you're saying is that by default this persistence context is scoped to the transaction.

Now, sure, in JPA the EntityManager isn't a singleton. But in most environments where JPA is commonly used (Java EE, CDI, Quarkus, Spring, ...), the persistence context lifecycle is managed by the framework/container and the application logic doesn't have to deal with that. The application logic gets handed a contextually-bound (usually transaction-scoped) EntityManager and uses it. It doesn't actually care whether it's a singleton or not. The association between persistence context and the transaction is managed by the container.

The only real difference in what you're describing, again assuming I understand correctly, is that in your case the association between the persistence context and the transaction is managed "internally" by your ORM implementation, rather than "externally" by CDI or whatever.

If that's right, then in Hibernate you can achieve exactly the same thing using SessionFactory.getCurrentSession(), and perhaps there's an argument to be made that we should add something like EntityManagerFactory.getCurrentEntityManager() to JPA. I've never really thought about that much because I don't have the impression that many people use SessionFactory.getCurrentSession() in Hibernate. (Though I could be wrong about that!!)

and no entity bean lifecycle.

On this point I might still be misunderstanding, since I don't know the details of your implementation. What precisely do you mean by this? Can a single entity instance be used in multiple transactions at the same time? If not, then I would say there's still a lifecycle.

But I'm not sure I understand what you mean by the word "lifecycle".

What this more radical approach allows is a programming model where there is literally only transactions to scope (using try with resources). It is arguably orientated to be easier to use stand alone without a JEE container or Spring because there is no EntityManager to manage.

Well, if all it does is achieve the same affect as a EntityManagerFactory.getCurrentEntityManager() method, I wouldn't say it's that radical. But I'm not confident that I fully understand what you're describing. Perhaps there's something more radical about it that I'm not seeing.

rbygrave commented 2 years ago

no entity bean lifecycle.

WRT jakarta-persistence-spec-3.0.pdf, 3.2. Entity Instance’s Life Cycle there is

An entity instance can be characterized as being new, managed, detached, or removed

So by "no entity bean lifecycle" I mean an entity bean does not need to be attached, managed or removed [from a persistence context or anything]. We can new up an entity bean and insert(), update() or delete() it [much like Hibernate StatelessSession]. When we fetch a bean, mutate it and then update() or delete() it's dirty state comes from the bean itself. [which might be the same with Hibernate when using enhancement?]. The crux of this distills down to having the dirty state on the entity beans themselves for update() and delete().

With Hibernate StatelessSession::update() where does the dirty state come from? Perhaps there is no dirty state which is also doable but if the entity beans hold their own dirty state its the same [StatelessSession::update() functions without the persistence context obtaining the dirty state from the beans themselves].

in most environments where JPA is commonly used (Java EE, CDI, Quarkus, Spring, ...), the persistence context lifecycle is managed by the framework/container and the application logic doesn't have to deal with that.

What IF we want to use JPA without any container, just in Java SE? Lets say I am writing something small like a lambda and I don't want to use any container, how well is JPA suited to that today? That thought in my mind isn't too far away from ... What IF someone wants to only use the StatelessSession API? Would we need a container if we only used StatelessSession API? [I don't think we do].

I don't think what I'm suggesting is "technically radical" in the sense that it really is just moving functionality around under the hood and having a different API (with a caveat that it really desires dirty state to be held on the entity beans themselves). What is perhaps radical about it is that its really suggesting ... What could the API be if someone only wanted to use the StatelessSession API?

rbygrave commented 2 years ago

Can a single entity instance be used in multiple transactions at the same time?

Well a single instance has state and shouldn't be mutated concurrently ... so it's more like: can a single entity be say fetched in one transaction and then persisted in another transaction - yes. [the dirty state is on the entity beans themselves].

In the case of nested transactions (savepoints), we can have some beans mutated and update() them in a nested transaction, if that fails we shouldn't use those beans that failed to persist per say but there isn't a need to "clear a persistence context" etc and we can carry on processing.

We can new up and entity bean and update() them - this is an update without any dirty state. Said differently, we can materialize an entity bean graph from say json and update() it.

I believe this would all be the same as StatelessSession or possible via StatelessSession if desired.

escay commented 1 year ago

While playing around with "Hibernate + StatelessSession" versus "Jakarta Persistence + EntityManager" possibilities to directly perform persistence operations I found this proposal.

I also found some real life issues I also ran into, which might be useful for this issue. Especially the last item is a good read about where to add a 'StatelessSession' API / interface: in Quarkus (Panache)? in Hibernate? or as in this issue: in Jakarta persistence?

"How to get EntityManager, without having an @Entity"
- https://github.com/quarkusio/quarkus/issues/7148
And a workaround for this is to add a dummy @Entity
- https://github.com/quarkusio/quarkus/issues/7280
"Support for @Inject StatelessSession"
- https://github.com/quarkusio/quarkus/pull/8861
"Provide StatelessSession for Panache"
- https://github.com/quarkusio/quarkus/issues/8348
"Provide read-only transactions."
- https://github.com/quarkusio/quarkus/issues/6414
"feat: read only transactions" follow up of 6414
- https://github.com/quarkusio/quarkus/pull/7455
"feat: read only sessions" follow up of 7455
- https://github.com/quarkusio/quarkus/pull/10077

lukasj commented 1 year ago

Can I read this proposal as to provide an API for direct JDBC with mapping through JPA to a certain extent?

Assuming we add 4 proposed methods (insert, update, delete, get) - what return types for them do you propose? For example referenced StatelessSession uses void for an update but JDBC which gets called under the neath returns may return something eventually. More generally would it make sense to somehow propagate the result from the underlying JDBC operation back to the user - not only by an exception if there's something wrong but also through the return type of the method or through some additional resultContext()? Could this help addressing the dirty state of an Entity I saw mentioned in some comments above?

I'm not in favor of adding these new stateless methods directly to EMF, having the instance of this new X created by EMF makes more sense to me.

Tomas-Kraus commented 1 year ago

When talking about API of this new feature, I have few more notes:

insert/delete/update will end up in java.sql.Statement method executeUpdate for DML statements. This method returns row count (number of rows affected by DML statement) and it shall be returned by those methods too. EM persist returns nothing and user is losing some information here.
get and find method names have special meaning associated with them. Get is being used to retrieve a single instance and find is usually associated with a collection of results. I’m fine with using get with primary key argument. EM used bad naming for this. We can still have find method as a shortcut for createQuery.

And few notes to transactions.

there will always be some state associated with them. :) The only question is how to handle them and how to design related API.
There is another option of API design: void transaction(Consumer<EMTransaction> task), maybe void transaction(Consumer<Transaction, StatelessEM> task) <T> T transaction(Function<EMTransaction, T> task), maybe <T> T transaction(Function<Transaction, StatelessEM, T> task) when user needs an option to handle it manually or void transaction(Consumer<StatelessEM> task) <T> T transaction(Function<StatelessEM, T> task) when user wants framework to commit/rollback it automatically depending on exception being thrown.

Flow API design for queries. We are adding new API to JPA so we may make it more user friendly. I’d like to see something like

Collection<Pokemon> result = StatelessEM.transaction(
        tx -> tx.createQuery(“SELECTp FROM Pokemon p WHERE p.name = :name”)
                    .setParameter("name", “Pikachu”)
                    .execute()
);

...I know, too simple for transaction, but it's jut an example. or (in case Transaction and StatelessEM will be available as a single interface in the functional interface)

StatelessEM.transaction(
        tx -> {
            try {
                tx.createQuery(“SELECT p FROM Pokemon p WHERE p.hp = :hp AND p.type = :type”)
                    .setParameter("hp", “120”)
                    .setParameter("type", “normal”)
                    .execute()
                    .forEach(p -> tx.delete(p));
                tx.commit();
            } catch (Throwable t) {
                tx.rollback();
            }
        }
);

fercomunello commented 11 months ago

It would be really cool to include this in the JPA spec, I agree 100%. 👍

IMO: The advantage is that all operations become explicit minimizing resources, no object is temporarily stored in memory during a transaction without you knowing (cache L1), and Lazy Fetching operations becomes also explicit, one "disadvantage" is lose the cache L1 and L2, however if the application is well written, it won't need the first-level cache nor the second-level cache as it ends up being mitigated by a Redis or Infinispan for instance.

Felk commented 2 months ago

For some more "prior art", EF Core has a concept of tracking vs no-tracking queries, and surfaces this functionality via a .AsNoTracking() method that changes a single query to be "no-tracking". The functionality is very similar to a "stateless entity manager", at least for the aspect of reading entitites for read-only use-cases