spring-projects / spring-framework

Spring Framework
https://spring.io/projects/spring-framework
Apache License 2.0
56.63k stars 38.14k forks source link

DefaultMessageListenerContainer: CACHE_CONSUMER with dedicated connections for Bitronix transactions [SPR-16642] #21183

Closed spring-projects-issues closed 10 months ago

spring-projects-issues commented 6 years ago

Andrew Tulloch opened SPR-16642 and commented

If the error handling path in the DefaultMessageListenerContainer the JMS connection is closed and re-opened.

Combining this with CACHE_CONSUMER with bitronix results in an error from DualSessionWrapper due to the Connection being involved in multiple XA transactions during the close attempt, which bitronix rejects and DMLC logs, swallows and continues.

This results in the Connection not being closed and a new Connection obtained from the bitronix pool resulting in a gradual leak from the pool.

Utilising dedicated connections alleviates this and enables CACHECONSUMER to function without issue. So making this a configuration option to not use the shared connection whilst CACHE* is enabled means we can avoid this path and only have a single thread using a dedicated connection avoiding the multiple active transaction close issue.


Affects: 5.0.4

Referenced from: pull request https://github.com/spring-projects/spring-framework/pull/1778

spring-projects-issues commented 6 years ago

Juergen Hoeller commented

We generally expect XA transactions to be used with CACHE_NONE there, calling a dedicated XA-aware JMS connection pool instead, not locally caching those resources within DefaultMessageListenerContainer at all. Why does the local caching make much difference in addition to the Bitronix pool that you have in use already (maybe because of a lack of consumer caching within that pool)?

I wonder whether we could specifically address this: maybe identifying the rejected connection close attempt and reusing the original connection afterwards? (Isn't it odd for the Bitronix pool to reject such a close attempt to begin with?) In any case, giving up the shared connection at DMLC level means giving up a lot of the caching benefit, so I'd rather go for some alternative solution.

spring-projects-issues commented 6 years ago

Andrew Tulloch commented

What we're seeing specifically with CACHE_NONE is significant blocking as Bitronix is synchronized in many places and it's configured to test the connections, with a few instances and of a reasonable number of threads this means most threads are just blocked due to time taken to execute the delete/create temporary queue on the broker that is the test on ActiveMQ/BTM.

So whilst we don't need caching in the DMLC for Session or Consumer, as you point out it caches those, it avoids the trip into the synchronized Bitronix pool and means we only end up going to the pool if we experience an error rather than every message as occurs with CACHE_NONE.

I'm not clear on what benefit we lose in terms of caching, this caches the connection, session and consumer as before, just it's a unique connection for that instance of the AsyncMessageListenerInvoker. There is the disadvantage that we've now have many more connections open because we're unable to use a shared connection to the broker.

I do find it odd it rejects the connection, the connection is in use by multiple transactions on separate threads (shared connection), so it can't be closed, but you'd hope it would be deferred and returned to the bitronix pool when those other transactions were completed. It however appears it is not and the DMLC has lots it's reference, so it's left open and not in the pool. Your suggestion of handling that error on close and retaining the connection could work.

We end up here with stack traces through these two lines in BTM: https://github.com/bitronix/btm/blob/btm-2.1.3/btm/src/main/java/bitronix/tm/resource/common/TransactionContextHelper.java#L140 https://github.com/bitronix/btm/blob/btm-2.1.3/btm/src/main/java/bitronix/tm/resource/jms/DualSessionWrapper.java#L188

spring-projects-issues commented 6 years ago

Andrew Tulloch commented

I think there's a couple of subtle details in the exception handling that may not be obvious. Initially looking at: https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L249

It appears that the exception is re-thrown after we rollback the transaction, which would theoretically provide the same behaviour if the transaction was thrown in the commit call here: https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L251

But when you dig down to AbstractPollingMessageListenerContainer you discover that only JMS Exceptions are re-thrown from there: https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L330

So if for example you had an OptmisticLockingException from persisting to a database there it's not re-thrown meaning the rollback call never happens, but status was already set to rollbackonly here: https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractPollingMessageListenerContainer.java#L325

So we actually reach the commit call to the transaction manager, but btm does seem to take this as a rollback as the status is rollback only.

We've noticed that in our consumer we were calling EntityManager#save(Object), which doesn't issue a flush, causing the flush to occur on the TransactionManager commit(status) call, this could result in the OptimisticLockingException in an unfortunate place without the check for JMSException and re-throw, so the container thinks we've had an infrastructure failure, when in fact we've just had some DB Update failure.

Changing this to EntityManager#saveAndFlush(Object) has reduced this massively as an persistence related exceptions now happen doReceiveAndExecute() and result in only JMS exceptions being rethrown and btm seeing a rollback-only status and performing a transaction rollback.

Not sure if this is quite intended to work this way, but it results in very different exception handling and outcomes.