quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.88k stars 2.71k forks source link

In-transaction serialization for Rest endpoint returning Hibernate entities that require lazy-loading #44918

Open yrodiere opened 9 hours ago

yrodiere commented 9 hours ago

Description

Some people have their Rest endpoints return entities, or List<MyEntity>, or types that have nested entities. That is dangerous as serialization currently happens after the transaction ends, even if @Transactional is applied to the endpoint method, so at best lazy-loading will fail, at worst it will rely on a query that runs outside of the original transaction and thus might return inconsistent data.

However, if we could make sure that serialization happens within the transaction, while making sure that the transaction ends before quarkus-rest/RestEasy handles exceptions (in case the serialization/transaction fails), then this practice would become safer as far as consistency is concerned -- though serializing a lazy-loadable entity always comes with the risk of serializing the whole database, but that's what @JsonIgnore and friends are for 🤷

Extracted from https://github.com/quarkusio/quarkus/issues/30096#issuecomment-2516659142 :

I'm wondering, however, if we should propose an easy-to-use, safe-ish alternative: we could make sure that serialization of entities returned from transactional methods actually happens within transaction boundaries, when the session/connection is still available. It's still dodgy, because, well, serializing an entity graph is no easy task and can result in serializing the whole database; but I can see it being useful, and it would avoid the most common pitfalls. And then, maybe jackson-datatype-hibernate could be helpful without also being dangerous.

Hm. If we can't end the transaction before we return the status, then the transaction commit may fail and we won't even tell the caller.

We would need to do serialization in a buffer, then commit the transaction, then determine the response status...

Implementation ideas

@yrodiere that is a very interesting idea. In Quarkus REST we could probably introduce some new annotation (or even repurpose the existing @Transactional one, although this is debatable) to ensure that the transaction is closed after the returned entity has been written to the body of the response. This could potentially create other problems for us, but it's probably worth experimenting with.

I'm not sure how serialization is done exactly, but if returned objects are first serialized into an in-memory buffer, that would be when I would close the transaction

That can't really be done because we don't control how the serialization library does its work. We can only react to serialization having completed (successfully or not).

can we take serialization into our own hands, calling jsonb/jackson/etc. as necessary, and somehow make quarkus-rest "believe" the method returns a byte[]/String? So that serialization happens earlier, and we get a chance to perform a commit?

Under the right circumstances, yeah, that is likely doable.

The circumstances I can think of off the top of my head are:

  • No async types are being used

  • No custom MessageBodyWriter types are in play.

  • We can figure out the proper MessageBodyWriter at build time (I am not 100% on this one, but it's likely necessary)

quarkus-bot[bot] commented 9 hours ago

/cc @gsmet (hibernate-orm)

FroMage commented 6 hours ago

We advise people to use DTOs not just for this good reason (transaction being closed between endpoint and serialisation) but also for other good reasons:

So that's already three good reasons to not waste time helping people send entities over REST outside of pet/demo projects. I'm frankly not sure we should spend time and effort in that direction.

Now, if you absolutely must serialise entities to JSON, assuming you have a pet/demo project that does not care about security and has no relations, perhaps we could talk about entity graphs instead? so that the right data is already loaded efficiently and there's no need to extend the transaction?

https://docs.jboss.org/hibernate/orm/6.6/userguide/html_single/Hibernate_User_Guide.html#fetching-strategies-dynamic-fetching-entity-graph but also fetch queries are options.

yrodiere commented 5 hours ago

That's very good arguments, and I'd certainly welcome a world where nobody serializes entities (even with entity graphs, TBH).

And yet... people do serialize entities. Worse, I suspect people enable lazy loading outside of transactions -- or at least try to.

My goal here was to make sure that people do as little dangerous stuff as possible, and evidently offering DTOs as a solution isn't enough since I rarely see it mentioned. Possibly because it's seen as inconvenient, but perhaps also because people have less issues when they use DTOs... survivor bias?

Maybe I'm not being realistic as we need to prioritize our effort indeed. But if we're not going to work on this, maybe let's be clear about it. Should we just document serializing entities -- especially with lazy loading -- as an unsupported bad practice, agree we'll reject all related issues, and call it a day?