ibm-messaging / mq-jms-spring

Components to assist MQ JMS integration with Spring frameworks
Apache License 2.0
189 stars 102 forks source link

Cacheless setup details #67

Open pvmibm opened 3 years ago

pvmibm commented 3 years ago

Hello, there is a simple app used for a long time running, which every 10 second fire a task which send message via JmsTemplate and if succeed read it back, for health-checking purposes. The app is configured neither to use mq pool nor to use cache, like

ibm.mq.pool.enabled = false spring.jms.cache.enabled = false

but at some point in FDC we see a tons of MQ objects like

Overview of JMS System Num. Contexts : 70 Num. Connections : 534 Num. Sessions : 462 Num. Consumers : 126 Num. Producers : 109

and at server side everything seems clear. is it configuration problem, with spring/mq-jms, or kind of bug? normally we would expect this setup will only create 1 session, 1 connection and 1 consumer and 1 producer respectively, as we always use single JMSTemplate.

just wonder where to look/go from there.

2.1.1

java.vendor :- IBM Corporation java.vendor.url :- http://www.ibm.com java.version :- 1.8.0_161 java.vm.info :- JRE 1.8.0 Linux amd64-64 Compressed References 20180208_378436 (JIT enabled, AOT enabled)

ibmmqmet commented 3 years ago

This is not likely to be anything do with this repo directly as the actual management of connections is handled either by the core generic Spring JMS layer, or it might be something in the underlying MQ JMS implementation. From your code fragment I can't see where the jmsTemplate variable is created or destroyed - presumably you're running this in some broader framework environment.

You said there's an FDC but don't give any details of what the actual error is. And you say the server side "looks clear" - do you mean that you only see a single (pair of) MQ connections there, or just that the qmgr keeps running fine with that level of connections.

Problems in the MQ JMS package itself would have to be handled by opening an Case in the IBM support system. Though if you're really running 2.1.1 of this package, that will probably be incorporating MQ 9.1.2 which itself is out of support.

pvmibm commented 3 years ago

thanks for update, sure, the case for MQ is also open, I just try to get a better understanding of all this. JmsTemplate is created only once and used for all the tasks, which should be fine. That I was also trying to say is that even with cache and pool are both disabled, this mq-jms-spring lib still uses CachingConnectionFactory, and thus caches Consumers and Producers at least, which might be a bit of misleading. But the problem with the app most likely not in this caching itself but probably in a way that it close its tasks, the app uses 1 second timeout to execute awaitTermination and then do shutdownNow, which probably negatively reflects in leak of session/producer/consumer. From the MQ server side the DISCINT is set to small value, so probably server just drops such an zombie sessions/connections, but how things works on CachingConnectionFactory side I'm not sure.

You are absolutely right about MQ client version, I know this is also needs to be updated to more recent.

And the FDCs contain

Source Method    :- proxyMQGET(RemoteTls,MQMD,MQGMO,int,byte [ ],Pint,SpiGetOptions,Pint,Pint)
ProbeID          :- 01
Thread           :- name=pool-2616095-thread-1 priority=5 group=main ccl=org.springframework.boot.loader.LaunchedURLClassLoader@d6c8722a

                                      Data
                                      ----

|   Description  :-  Thread interrupted while waiting for lock

which I guess is due to executor shutdownNow was called

                               : com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.proxyMQGET(RemoteProxyQueue.java:2601)
                               : com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternalWithRecon(RemoteFAP.java:7103)
                               : com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternal(RemoteFAP.java:6988)
                               : com.ibm.mq.jmqi.internal.JmqiTools.getMessage(JmqiTools.java:1316)
                               : com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGet(RemoteFAP.java:6935)
                               : com.ibm.mq.ese.jmqi.InterceptedJmqiImpl.jmqiGet(InterceptedJmqiImpl.java:1341)
                               : com.ibm.mq.ese.jmqi.ESEJMQI.jmqiGet(ESEJMQI.java:602)
                               : com.ibm.msg.client.wmq.internal.WMQConsumerShadow.getMsg(WMQConsumerShadow.java:1801)
                               : com.ibm.msg.client.wmq.internal.WMQSyncConsumerShadow.receiveInternal(WMQSyncConsumerShadow.java:230)
                               : com.ibm.msg.client.wmq.internal.WMQConsumerShadow.receive(WMQConsumerShadow.java:1466)
                               : com.ibm.msg.client.wmq.internal.WMQMessageConsumer.receive(WMQMessageConsumer.java:674)
                               : com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receiveInboundMessage(JmsMessageConsumerImpl.java:1073)
                               : com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receive(JmsMessageConsumerImpl.java:489)
                               : com.ibm.mq.jms.MQMessageConsumer.receive(MQMessageConsumer.java:179)
                               : org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:138)
                               : org.springframework.jms.core.JmsTemplate.doReceive(JmsTemplate.java:788)
                               : org.springframework.jms.core.JmsTemplate.doReceive(JmsTemplate.java:765)

sorry I didn't include receive method, they call it without timeout, probably also subject to change to put receive timeout there and only terminate threads for a bigger one timeout, just in case, to let them clean up gracefully.

ibmmqmet commented 3 years ago

The JMS code does have a habit of emitting FDCs when interrupted - I'd prefer it didn't do that as it's not really a severe error worthy of an FDC, but I've been told it's not something that's likely to change.

Since I've been adding trace points to the code, which will be available from the next update, it was easy to check the different pool/cache/single CFs are being called in the way I'd expect when both those properties are set to false. I don't remember any changes that might have fixed the behaviour if it's broken in your old version.