jpos / jPOS

jPOS Project
http://jpos.org
GNU Affero General Public License v3.0
599 stars 458 forks source link

QMUX Sequencing Issue #562

Closed ar closed 10 months ago

ar commented 10 months ago

From jpos-users

We recently had an issue in two of our JPOS applications, and they both relate to QMUX getting out of sequence. There was impact to one of our Clients for a few days before it was noticed. This was caused by the Channels receiving an 0800 message and putting it into the QMUX inbound Queue before QMUX was started, it was further down in the list of deployed files.

QMUX.notify() only reads a single item from its input Queue in the Space, yet notify() will only be invoked once the Listener is attached to the Space. There is no "catch-up".

The problem is from that point on, the QMUX will always be processing the "Previous" message in the Space each time notify() is invoked. Depending on how active the interface is, it could be a long time until it sees the response to a given transaction. The only way to rectify the problem was to recycle the whole JVM, since recycling any individual QBean would not clear the extra items from the Space. This could technically happen if the QMUX XML needed to be modified, and a message was received while the QMUX was being restarted.

In our own SpaceListener implementations, we have resorted to repeating the read until the given Space key is empty before returning. (Suggesting we must have seen this problem before, but this was the first time for QMUX.) Or if there are no messages, the notify() simply returns. QMUX does the latter too.

Looks like QServer is the only other implementer of SpaceListener that could have the same issue.

We will be addressing the immediate problem by putting the QMUX (and IsoRequestListeners) before the Channel Adapters, but I thought I should bring this to your attention.

We have seen bad implementations of TCP/IP processing and message matching where it is assumed the next read on the socket will be the response to the current transaction. And if single-threaded, may always leave more data in the TCP input buffer. The QMUX/ChannelAdapter logic is obviously created to better handle multi-threading over TCP/IP, but it internally reintroduces one of the problems of getting out of sequence.

For outbound messages, there is the field-based matching for responses, so you may not be given the "wrong" message back, but instead QMUX doesn't notice the new response waiting in the Space, so the current transaction will likely time-out.

To reproduce:

  1. Have a ChannelAdapter that can receive inbound requests (e.g. 0800).
  2. Start JPOS but with the QMUX file renamed out.
  3. Check if the message has been received by the Channel.
  4. Start the QMUX.
  5. QMUX doesn't see any messages immediately.
  6. Send another message into the ChannelAdapter.
  7. The first message is received by the QMUX, and if a IsoRequestListener was included, may respond (e.g. 0810).
  8. The second message remains in the Queue until a third message appears on the socket, or there was something to send out.
ar commented 10 months ago

The out of order start is something the QMUX takes care, look at this code:

https://github.com/jpos/jPOS/blob/master/jpos/src/main/java/org/jpos/q2/iso/QMUX.java#L98-L103

At start time, if there are entries in the Space, we re-process them.

What version of jPOS are you using? What's the "key" for your messages? Can you share the Q2 log showing this behavior?

max-m-s commented 10 months ago

I'm sorry. This application was using a very old version of the package, where that section you mentioned doesn't exist. I did compare the class to a later version, but only the notify() method itself.

We will update the jar in the same change as the config re-order is done.

ar commented 10 months ago

Excellent. That change was certainly implemented to address your particular issue.

On Mon, 11 Sep 2023 at 08:15, max-m-s @.***> wrote:

I'm sorry. This application was using a very old version of the package, where that section you mentioned doesn't exist. I did compare the class to a later version, but only the notify() method itself.

We will update the jar in the same change as the config re-order is done.

— Reply to this email directly, view it on GitHub https://github.com/jpos/jPOS/issues/562#issuecomment-1713673343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYQAS5HDHC3E7OWOBWJN3XZ3XE5ANCNFSM6AAAAAA4QOWUSI . You are receiving this because you authored the thread.Message ID: @.***>

--

@apr http://twitter.com/apr