quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.4k stars 2.57k forks source link

EntityManagerFactory synchronizes transactions with UNSYNCHRONIZED #21991

Open kny78 opened 2 years ago

kny78 commented 2 years ago

Describe the bug

In our project, we need / designed the code to use unsynchronized transactions.

But we found out that depending on the order of get connection and "begin transaction* it gets the same or different connection!

Example code and working and failing tests: https://github.com/kny78/non-unique-transaction

GitHub Actions that the ok and failing tests:

GitHub Actions run

Expected behavior

We expect to get different connections when using UNSYNCHRONIZED!

Actual behavior

We get the same connection when we begin the transaction before we get the connection.

How to Reproduce?

Reproducer: @kny78

Steps to reproduce:

Output of uname -a or ver

Linux knypc 5.15.6-200.fc35.x86_64 #1 SMP Wed Dec 1 13:41:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

17

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.5.1

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)

Additional information

No response

kny78 commented 2 years ago

It seems like the bug is in Hibernate-orm

quarkus-bot[bot] commented 2 years ago

/cc @Sanne, @gsmet, @yrodiere

Sanne commented 2 years ago

Hello, interesting request; we'll need to talk about this.

Since we use Agroal as datasource, any JDBC connection retrieved from the pool needs to be opened within the context of an active JTA transaction. It's generally a very bad idea to use UNSYNCHRONIZED while the underlying connection is JTA managed; could you elaborate on the reasons for this choice?

cc/ @barreiro

Thanks

kny78 commented 2 years ago

Thanks for your interest :-)

The there are several reasons.

  1. We don't use two-phase commits. It is a bad idea for our workloads.
  2. Typically our work-flow works with:
    1. Tx 1: Get data from database.
    2. Rest API call to external resource. (Some take > 5 seconds!)
    3. Tx2: Store the data in the database.
  3. We love programmatic transactions. They are easy to understand, and easy to work with.
  4. With AutoCloseable, it is possible to have simple syntax like:
    
    val txManager: TxManager // Handles EntityManagerFactory

txManager.autoCommitTx { dbCtx -> dbCtx.person.persist(Person(123, "Donald Duck")) dbCtx.person.persist(Person(234, "Dolly Duck")) }



The `DbCtx` class implements AutoCloseable. In the `close()` method, we close or roll back the Transaction.

I will make a short example in the repo mentioned above
Sanne commented 2 years ago

Many thanks :)

But allow me to debate this a little, for sake of understanding and possibly find a solution.

  1. We don't use two-phase commits. It is a bad idea for our workloads.

Agreed that it's a good idea to avoid them - but are you aware that if there's a single resource involved (which it most likely is a controlled context like you describe), the transaction manager will automatically optimise this like it was a single phase?

  1. We love programmatic transactions. They are easy to understand, and easy to work with.

Agreed, that's my personal preference as well.

But you could make use of these without resorting to use UNSYNCHRONIZED EntityManager(s) ?

sebersole commented 2 years ago

It seems like the bug is in Hibernate-orm

Do you have a reproducer that uses just Hibernate without Quarkus? We have some tests of UNSYNCHRONIZED in the ORM testsuite; not at all exhaustive, but if you're going to make that claim... ;)

I still don't understand why your work-flow requires UNSYNCHRONIZED. There are many ways to achieve "programmatic transactions". Maybe this is specific to Quarkus? All the more reason to get a pure-Hibernate test.

kny78 commented 2 years ago

@sebersole I was maybe a bit quick on blaming Hibernate-orm :-O

But any help on understanding what happens will be great.

kny78 commented 2 years ago

But allow me to debate this a little, for sake of understanding and possibly find a solution.

:sun_with_face:

  1. We don't use two-phase commits. It is a bad idea for our workloads.

Agreed that it's a good idea to avoid them - but are you aware that if there's a single resource involved (which it most likely is a controlled context like you describe), the transaction manager will automatically optimise this like it was a single phase?

In our case we have 2 resources involved: Db and Rest API. Only the DB is part of the transactions. The Rest API is not part of the transaction.

But since the Rest call can take > 5 seconds, a lot can happen in the database in the mean time. Therefore we just commit the transaction, and release the connection. Then we start a new TX after the Rest Call.

  1. We love programmatic transactions. They are easy to understand, and easy to work with.

Agreed, that's my personal preference as well.

But you could make use of these without resorting to use UNSYNCHRONIZED EntityManager(s) ?

I have made an example in the code: https://github.com/kny78/non-unique-transaction

See specifically the endpoint in this file: https://github.com/kny78/non-unique-transaction/blob/master/src/main/kotlin/im/kny/PersonsFailingResource.kt

See the README.md file for how to run it.

The "PersonsFailingResource" is a bit more complex than I would like, but it shows that running Kotlin coroutines or Java's CompletableFuture will fail in many scenarios.

(Also it is very dangerous to share the connection!)

sebersole commented 2 years ago

I can just explain how UNSYNCHRONIZED works, but it sounds like you grok that already. Sanne mentioned that Agroal has some limitations around when you can access Connections based on when you "join" a transaction.

I think part of the problem may be just simple vocabulary. Like "join". That has a very specific meaning here for JPA. Not sure how that aligns with any expectations that Agroal may have.

Nor do I understand where/when you open your EntityManager/Session relative to the JTA transactions. Too much I don't know here.

One thing I did notice in a quick look at your repo... You say you use JTA and UNSYNCHRONIZED, but you then use (what JPA poorly calls) "entity transactions". That could also have a negative impact here. JPA says it is illegal to use entity transactions when using JTA. Hibernate let's you, but I cannot say we have spent a lot of time testing what happens when you start combining all of these somewhat competing options

sebersole commented 2 years ago

From Sanne:

Agroal's design has some limitations such as requiring that all connections be opened only after enlisting in the transaction manager

I think this is going to be your limitation here. UNSYNCHRONIZED is all about being able to do the opposite.

I would think, given what you describe, that really you just don't want the Rest calls to be part of the same transactions as your Tx1 and Tx2. Or am I still missing more?

kny78 commented 2 years ago

I would think, given what you describe, that really you just don't want the Rest calls to be part of the same transactions as your Tx1 and Tx2. Or am I still missing more?

Yes, one more thing. We need to be able to start one or more co-routines / CompletableFutures, which has it's own transactions, runs in parallel with the "starting thread".

If you look at this example: PersonsFailingResource.kt you will see an example of how to provoke the issue.

What happens is that the "main-thread / Rest Controller thread" closes the connection and transaction. But since the suspend function uses the same transaction, the suspended function fails. (It would in many scenarios be worse if it does not!

sebersole commented 2 years ago

Well, we keep coming back to the stated limitation of Agroal which does not sound like it will be able to support UNSYNCHRONIZED.

The only option I see for you, which is what I was trying to imply in the last comment, is to manage the JTA transaction yourself.

inJtaTransaction( (em) -> {
    getDataFromDatabase( em, ... );
} );

callRest();

inJtaTransaction( (em) -> {
    storeDataIntoDatabase( em, ... );
} );
kny78 commented 2 years ago

Is there some API for starting and stopping JTA transactions in Quarkus?

I found this: Managing JTA Transctions Programmatically using the API Approach

I guess this also binds the TX to the ThreadLocal, and does not give me a specific EntityManager.

sebersole commented 2 years ago

Unfortunately I do not know Quarkus that well. Hibernate lets you do that, but I know Quarkus restricts the things Hibernate lets you do. Not sure if that will work. Assuming it does not restrict this, you could try e.g.:

final EntityManagerFactory emf = ...;
final Data yourData = emf.unwrap( SessionFactory.class ).fromTransaction( (session) -> {
    return session.get( Data.class, 1 );
} );

callRest();

final Data yourData = emf.unwrap( SessionFactory.class ).inTransaction( (session) -> {
    session.merge( yourData );
} );

You could also manually manage transactions using EntityManager#getTransaction().

Again, no idea if Quarkus restricts this stuff in any way

kny78 commented 2 years ago

EntityManager#getTransaction()

We use Entitymanager#getTransaction(), but it gets the same connection and transaction.

So I guess Quarkus does not support our usecase.

@Sanne Any proposals on how we can manage this?

Sanne commented 2 years ago

I've not been able to look more in detail today, I'll try tomorrow.

In principle the only restriction we have in Quarkus is the one I mentioned about Agroal: we're not allowed to use a JDBC connection outside of the scope of a transaction.

So there should be no problem to control transactions programmatically, and model your use case like Steve suggested above. Did you try the approach of injecting the UserTransaction reference ?

sebersole commented 2 years ago

If UserTransaction works, in theory you should be able to use the fragments I pasted earlier. Under the covers, Hibernate's Transaction object delegates to the JTA objects.

If it has to be UT rather than TM, I'd say to set hibernate.jta.prefer_user_transaction=true. Also check the JtaPlatform Quarkus gives/configures for Hibernate.

HTH

kny78 commented 2 years ago

Good morning,

About the proposed solution

I don't find the fromTransaction() or inTransaction() methods on the SessionFactory or SessionFactoryImpl.

Another aspect of this, is that I would like to programmatically handle my Transactions, but I cannot find a method to create a "UserTransaction" myself. It seems like a bean that is @Inject, and when I call userTransaction.begin(), it starts a transaction that binds to ThreadLocal or something like this.

That is what I'm trying to avoid :eyes:

Current concept of Transactions and EntityManager

The existing JPA concept was based on the following presumptions:

  1. All work in one Thread. No CompletableFuture / CoRoutines / Mutiny
  2. Before Java 7, there was no AutoCloseable. Therefore closing transactions was tedious and cumbersome.
  3. The Container should just hide away as much as possible.

But fast forward to Java 7, 8, 11 and 17 we have gotten:

Based on the new features in Java 7 and 8, I think it is time to review the programming model for JPA.

Proposal for easier Transaction handling in Quarkus / Hibernate / JPA

I presume you have looked at the code I sent. See specifically:

Core ideas:

I know a lots of projects have invested a lot in the current approach, but I think that this new approach can live side-by-side with the old approach.

sebersole commented 2 years ago

I don't find the fromTransaction() or inTransaction() methods on the SessionFactory or SessionFactoryImpl.

Ah, sorry. I had only added that to 6.0 - https://github.com/hibernate/hibernate-orm/blob/main/hibernate-core/src/main/java/org/hibernate/SessionFactory.java#L120-L208

sebersole commented 2 years ago

Based on the new features in Java 7 and 8, I think it is time to review the programming model for JPA.

Feel free to make your suggestions to the spec - https://github.com/eclipse-ee4j/jpa-api/

sebersole commented 2 years ago

starts a transaction that binds to ThreadLocal or something like this.

Sure, JTA is generally Thread based

Sanne commented 2 years ago

FWIW Quarkus also integrates Microprofile Context Propagation, which will ensure that thread-local stored state such as the current JTA transaction is also available to reactive contexts possibly running on a different thread.

I don't have Kotlin experience, but I suppose this implies the Kotlin co-routine spawned from a JTA-active context will also be running in the same JTA transaction.

kny78 commented 2 years ago

@Sanne Our problem is that we don't want the same context when we start a new "co-routine" or CompletableFuture.

Typically we spawn a new CompletableFuture, which runs for longer or shorter than the thread that starts is. And they work with different data, and so they should not be in the same transaction.

(I believe Co-routines in Kotlin uses CompletableFuture).

ylepikhov commented 1 year ago

Hi. First of all, it is necessary to reflect this in the documentation. That is, write in the JPA limitations that UNSYNCHRONIZED persistent context is not supported (for now).

Is connectionHandlingMode (which is hardcoded to DELAYED_ACQUISITION_AND_RELEASE_BEFORE_TRANSACTION_COMPLETION in #7242) the reason of this behavior? Does the issue get resolved if connectionHandlingMode value is changed to DELAYED_ACQUISITION_AND_RELEASE_AFTER_STATEMENT? Could it be made configurable?

P.S. I'm very upset because of this.

yrodiere commented 1 year ago

Hey,

@ylepikhov

P.S. I'm very upset because of this.

While I can understand that, venting will only antagonize the people who can help you. It's definitely not in your best interest.

@Sanne

I've not been able to look more in detail today, I'll try tomorrow.

Did you get a chance to have a look? Otherwise, any information that could help someone who would want to try and fix this?

ylepikhov commented 1 year ago

@yrodiere

While I can understand that, venting will only antagonize the people who can help you. It's definitely not in your best interest.

Just keep in mind that people who report issues, write comments, make pull requests and simply use your code are actually helping you. I had reasons to be upset. I wasted a lot of time struggling with this issue. I wouldn't if this limitation was documented. Sorry that my upsession made you upset.

ylepikhov commented 1 year ago

Anyway, I have workaround at moment. My situation: I need to write some asynchronous code that use EntityManager. It reads entity from database (1), make async web request (2) and then updates entity (3). Steps (1) and (3) could be executed in different threads. For the most part I need sort of @ConversationScoped EntityManager (which I know is not supported by Quarkus, see docs). I tried to use UNSYNCHRONIZED EntityManager (created manually with EntityManagerFactory) but got error about deferred enlistement isn't supported. I tried to use Context Propagation with no luck as it's quite complecated. I'm not even sure that propagated context (especially CDI RequestScoped) can be active after being diactivated in original thread (which I need). I was able to manually start transaction with TransactionManager and propagate it through kotlin coroutine calls via custom CoroutineContextElement (actually ThreadContextElement). So I have EntityManager which can be used in async code like this:

// execute with thread pool
withContext(Dispatchars.IO) {
  // starts Transaction, adds ContextElement to propagate transaction, creates EntityManager which can be accessed via entityManager property inside following code block
  unitOfWork {
    val myEntity = entityManager.find(MyEntity::class:java, 1) // (1)
    val result = someAsyncCall() // (2), maybe long running
    myEntity.result = result // (3)
  } // end of transaction, maybe other thread
}

But this code starts transaction in point (1) and spans it on someAsyncCall which I want to avoid.

May be use of DELAYED_ACQUISITION_AND_RELEASE_AFTER_STATEMENT can help to workaround problem with deferred enlistment not been supported by Agroal. But it would be better if deferred enlistment was supported.

UPDATE: actually this is not a problem as even in scope of JTA trasaction EntityManager can execute readonly quaries without database transaction.

ylepikhov commented 1 year ago

As for the original issue - maybe it's possible to avoid simultaneous use of transactions? Kotlin coroutines helps to write such code.

First of all, you need to control transaction lifetime. In context of quarkus I would use TransactionManager and JTA transactions instead of EntityTransactions. As we already know Agroal has a strong relationship with JTA transactions. And in some cases I'd let Hibernate decide when to actually to start transaction (especially in case of UNSYNCHRONIZED).

So, just inject TansactionManager and use it's api to control transaction lifetime.

Second, in case of Kotlin coroutines it's necessary to somehow propagate transaction through coroutines suspensions and resumptions. (Note: kotlin coroutines does not use CompletableFutures). It can be done with custom coroutine element (see https://codingwithmohit.com/coroutines/custom-coroutine-context-uses-cases/ for more information and use cases).

Here is example of custom ContextElement for transaction propagation:

class TransactionalContextElement(
    private val transactionManager: TransactionManager,
    private val transaction: Transaction
) : ThreadContextElement<Transaction?>, AbstractCoroutineContextElement(Key) {

    companion object Key : CoroutineContext.Key<TransactionalContextElement>

    // called during coroutine suspension and at the end of context block (see withContext usage later), suspends JTA transaction in current thread
    override fun restoreThreadContext(context: CoroutineContext, oldState: Transaction?) {

        if (transactionManager.status != Status.STATUS_NO_TRANSACTION && transactionManager.transaction == transaction)
            transactionManager.suspend()
    }

    // called at the begining of context block and during coroutine resumption, resumes JTA if needed
    override fun updateThreadContext(context: CoroutineContext): Transaction? {

        if (transactionManager.status != Status.STATUS_NO_TRANSACTION) {
            if (transactionManager.transaction != transaction) throw TransactionalException("transactionManager .status != Status.STATUS_NO_TRANSACTION + other transaction = InvalidTransactionException", InvalidTransactionException())
        }
        else transactionManager.resume(transaction)

        return null
    }
}

Also it's useful to have class that constructs ContextElement then necessary, like this:

@ApplicationScoped
class TransactionalScopeBuilder @Inject constructor(
    private val transactionManager: TransactionManager
) {

    suspend operator fun <T> invoke(block: suspend () -> T): T {

        transactionManager.begin()
        val transaction = transactionManager.transaction

        return withContext(TransactionalContextElement(transactionManager, transaction)) {
            val result = try {
                block.invoke()
            }
            catch (e: Exception) {
                transactionManager.rollback()
                throw e
            }
            transactionManager.commit()
            result
        }
    }
}

This is simplified version of TransactionalScopeBuilder, my current version also supports javax.transaction.Transactional.TxType modes (similar to @Transactional) and readonly transaction mode (for EntityManager optimization).

To use transactions just inject TransactionalScopeBuilder like this:

@ApplicationScoped
class TransactionalScopeBuilderExample @inject constructor(transactional: TransactionalScopeBuilder) {

    // suspended! 
    suspend fun example() {

        // execute in thread pool
        withContext(Dispatchers.IO) {

            // no transaction
            transactional {

                // JTA transaction started

                delay(1000)
                // possibly another thread, same JTA transaction

                launch(newSingleThreadContext("MyOwnThread")) {

                    // another thread, same JTA transaction
                    // actually this is bad (concurrent) propagation, see below for solution 
                }.join()

                // still the same JTA transaction
            }
            // JTA transaction committed or rolled back in case or exception  
        }
    }
}

To temporarily stop transaction propagation there is another ContextElement:

val nonTransactionalContextElement: ThreadContextElement<Transaction?> = object : ThreadContextElement<Transaction?>, AbstractCoroutineContextElement(
    // reuse TransactionalContextElement.Key!
    TransactionalContextElement.Key
) {
    override fun restoreThreadContext(context: CoroutineContext, oldState: Transaction?) {
        if (transactionManager.status != Status.STATUS_NO_TRANSACTION) 
            throw TransactionalException("transactionManager.hasActiveTransaction + shouldn't = InvalidTransactionException", InvalidTransactionException())
        if (oldState != null) transactionManager.resume(oldState)
    }

    override fun updateThreadContext(context: CoroutineContext): Transaction? {
        if (transactionManager.status != Status.STATUS_NO_TRANSACTION) 
            return transactionManager.suspend()
        return null
    }
}

suspend fun <T> nonTransactional(block: suspend CoroutineScope.() -> T) = withContext(nonTransactionalContextElement, block)

And it's usage:

transactional {
  // JTA transaction started
  nonTransactional {
    // transaction suspended
  }
  // transaction resumed
}

So, to start two transactions you can use TransactionalScopeBuilder and kotlin's functions like launch or async.

Moreover, I found that quarkus supports @TransactionScoped CDI scope. It's active during JTA transaction. So you even can write your own CDI producer for your own qualified EntityManager bean and just inject it and use inside transcational blocks.

I wrote producer to construct UNSYNCRONIZED EntityManager, it actually returns wrappers which automate joinTransaction calls.

And why @TransactionScoped is not in list of supported CDI scopes?

Hope this was useful.

yrodiere commented 1 year ago

And why @TransactionScoped is not in list of supported CDI scopes?

It is, at the end of https://quarkus.io/guides/cdi#what-scopes-can-i-actually-use-in-my-quarkus-application. But you're right, it's not very visible. I created #28870 to address that.

I'll let someone else comment on the rest, though, as my experience with Kotlin is close to zero.

apatrida commented 1 year ago

As for the original issue - maybe it's possible to avoid simultaneous use of transactions? Kotlin coroutines helps to write such code.

First of all, you need to control transaction lifetime. In context of quarkus I would use TransactionManager and JTA transactions instead of EntityTransactions. As we already know Agroal has a strong relationship with JTA transactions. And in some cases I'd let Hibernate decide when to actually to start transaction (especially in case of UNSYNCHRONIZED).

So, just inject TansactionManager and use it's api to control transaction lifetime.

Second, in case of Kotlin coroutines it's necessary to somehow propagate transaction through coroutines suspensions and resumptions. (Note: kotlin coroutines does not use CompletableFutures). It can be done with custom coroutine element (see codingwithmohit.com/coroutines/custom-coroutine-context-uses-cases for more information and use cases).

Here is example of custom ContextElement for transaction propagation:

class TransactionalContextElement(
  private val transactionManager: TransactionManager,
  private val transaction: Transaction
) : ThreadContextElement<Transaction?>, AbstractCoroutineContextElement(Key) {

  companion object Key : CoroutineContext.Key<TransactionalContextElement>

  // called during coroutine suspension and at the end of context block (see withContext usage later), suspends JTA transaction in current thread
  override fun restoreThreadContext(context: CoroutineContext, oldState: Transaction?) {

      if (transactionManager.status != Status.STATUS_NO_TRANSACTION && transactionManager.transaction == transaction)
          transactionManager.suspend()
  }

  // called at the begining of context block and during coroutine resumption, resumes JTA if needed
  override fun updateThreadContext(context: CoroutineContext): Transaction? {

      if (transactionManager.status != Status.STATUS_NO_TRANSACTION) {
          if (transactionManager.transaction != transaction) throw TransactionalException("transactionManager .status != Status.STATUS_NO_TRANSACTION + other transaction = InvalidTransactionException", InvalidTransactionException())
      }
      else transactionManager.resume(transaction)

      return null
  }
}

Also it's useful to have class that constructs ContextElement then necessary, like this:

@ApplicationScoped
class TransactionalScopeBuilder @Inject constructor(
    private val transactionManager: TransactionManager
) {

    suspend operator fun <T> invoke(block: suspend () -> T): T {

        transactionManager.begin()
        val transaction = transactionManager.transaction

        return withContext(TransactionalContextElement(transactionManager, transaction)) {
            val result = try {
                block.invoke()
            }
            catch (e: Exception) {
                transactionManager.rollback()
                throw e
            }
            transactionManager.commit()
            result
        }
    }
}

This is simplified version of TransactionalScopeBuilder, my current version also supports javax.transaction.Transactional.TxType modes (similar to @Transactional) and readonly transaction mode (for EntityManager optimization).

To use transactions just inject TransactionalScopeBuilder like this:

@ApplicationScoped
class TransactionalScopeBuilderExample @inject constructor(transactional: TransactionalScopeBuilder) {

  // suspended! 
  suspend fun example() {

      // execute in thread pool
      withContext(Dispatchers.IO) {

          // no transaction
          transactional {

              // JTA transaction started

              delay(1000)
              // possibly another thread, same JTA transaction

              launch(newSingleThreadContext("MyOwnThread")) {

                  // another thread, same JTA transaction
                  // actually this is bad (concurrent) propagation, see below for solution 
              }.join()

              // still the same JTA transaction
          }
          // JTA transaction committed or rolled back in case or exception  
      }
  }
}

To temporarily stop transaction propagation there is another ContextElement:

val nonTransactionalContextElement: ThreadContextElement<Transaction?> = object : ThreadContextElement<Transaction?>, AbstractCoroutineContextElement(
    // reuse TransactionalContextElement.Key!
    TransactionalContextElement.Key
) {
    override fun restoreThreadContext(context: CoroutineContext, oldState: Transaction?) {
        if (transactionManager.status != Status.STATUS_NO_TRANSACTION) 
            throw TransactionalException("transactionManager.hasActiveTransaction + shouldn't = InvalidTransactionException", InvalidTransactionException())
        if (oldState != null) transactionManager.resume(oldState)
    }

    override fun updateThreadContext(context: CoroutineContext): Transaction? {
        if (transactionManager.status != Status.STATUS_NO_TRANSACTION) 
            return transactionManager.suspend()
        return null
    }
}

suspend fun <T> nonTransactional(block: suspend CoroutineScope.() -> T) = withContext(nonTransactionalContextElement, block)

And it's usage:

transactional {
  // JTA transaction started
  nonTransactional {
    // transaction suspended
  }
  // transaction resumed
}

So, to start two transactions you can use TransactionalScopeBuilder and kotlin's functions like launch or async.

Moreover, I found that quarkus supports @TransactionScoped CDI scope. It's active during JTA transaction. So you even can write your own CDI producer for your own qualified EntityManager bean and just inject it and use inside transcational blocks.

I wrote producer to construct UNSYNCRONIZED EntityManager, it actually returns wrappers which automate joinTransaction calls.

And why @TransactionScoped is not in list of supported CDI scopes?

Hope this was useful.

this seems like something that can be a PR to Quarkus Kotlin support no?