jentrata / jentrata-msh

Jentrata - Message Handler Service
jentrara.org
Other
19 stars 57 forks source link

Failed asynch acknowledgements handled incorrectly #8

Closed EdwinVDS closed 5 years ago

EdwinVDS commented 8 years ago

Hi,

I've got the a setup where 'my' Jentrata MSH is communicating with a 3rd part MSH. When an acknowledgement is sent, and fails, a subsequence retry from the original sender is not handled according to the ebMS spec.

The sequence of messages is as follows:

// 3rd party sends a ebXML order message over HTTPS, which is receive and stored by Jentrata 1) 3P --(Order ebXML)--> Me // My MSH creates an Asynch Acknowledgement message and tries to send it but this fails 2) 3P X<-(Asynch Ack ebXML)-- Me // After the CPA's retry period, the sender resends the order message 3) 3P --(Order ebXML)--> Me // My Jentrata MSH detects this as a previously handled message and does not send an acknowledgement.

According to the ebMS spec Jentrata should send the same Ack message again but Jentrata does not comply with this spec and doesn't send anything in response to the retry by the 3P.

from Para 6.5.4 Resending Lost Application Messages [ https://www.oasis-open.org/committees/download.php/272/ebMS_v2_0.pdf]

The rules applying to the non-receipt of an anticipated Acknowledgment due to the loss of either the application message or the Acknowledgment Message are as follows: • The Sending MSH MUST resend the original message if an Acknowledgment Message has been requestedbut has not been received and the following are true: • At least the time specified in the RetryInterval parameter has passed since the message was last sent, • The message has been resent less than the number of times specified in the Retries parameter. • If the Sending MSH does not receive an Acknowledgment Message after the maximum number of retries, the Sending MSH SHALL notify the application and/or system administrator function of the failure to receive an Acknowledgment Message (see also section 4.2.3.2.4 concerning treatment of errors). • If the Sending MSH detects a communications protocol error, the Sending MSH MUST resend the messageusing the same algorithm as if it has not received an Acknowledgment Message. "

EdwinVDS commented 8 years ago

FYI, I am now testing a fix for this. I can't seem to attach a file so here is the patch

diff --git "a/C:\\Users\\VANDER~1\\AppData\\Local\\Temp\\TortoiseGit\\OutFEF6.tmp\\OutboxTask-173f0a7-left.java" "b/C:\\development\\Jentrata-BR\\jentrata-msh\\Plugins\\CorvusEbMS\\src\\main\\java\\hk\\hku\\cecid\\ebms\\spa\\task\\OutboxTask.java"
index 87930c2..e743077 100644
--- "a/C:\\Users\\VANDER~1\\AppData\\Local\\Temp\\TortoiseGit\\OutFEF6.tmp\\OutboxTask-173f0a7-left.java"
+++ "b/C:\\development\\Jentrata-BR\\jentrata-msh\\Plugins\\CorvusEbMS\\src\\main\\java\\hk\\hku\\cecid\\ebms\\spa\\task\\OutboxTask.java"
@@ -478,12 +478,13 @@ public class OutboxTask implements ActiveTask {
                toStatus = MessageClassifier.INTERNAL_STATUS_DELIVERED;
                toStatusDesc = "Message was sent.";
            }
-           // Update the status and clear the message.
-           try{
-               this.messageDVO.setStatus(toStatus);
-                this.messageDVO.setStatusDescription(toStatusDesc);
-                messageServerDAO.clearMessage(this.messageDVO);
-           } catch (DAOException daoe){
+           // Update the status and clear the message.
+           try{
+               this.messageDVO.setStatus(toStatus);
+               this.messageDVO.setStatusDescription(toStatusDesc);
+               if (!MessageClassifier.INTERNAL_STATUS_DELIVERY_FAILURE.equals(toStatus)) {
+                   messageServerDAO.clearMessage(this.messageDVO);
+               }           } catch (DAOException daoe){
                String detail = "Error in clear the non-reliable message: " + mID;  
                 EbmsProcessor.core.log.error(detail, daoe);
                 throw new DeliveryException (detail, daoe);
EdwinVDS commented 8 years ago

The test has returned a positive result!

dannykool commented 8 years ago

Hi,

I also ran into this compliancy issue and tested the fix suggested above. The fix seems to work, but not in all cases.

Please note that in addition to the quote from Para 6.5.4 "Resending Lost Application Messages" [ https://www.oasis-open.org/committees/download.php/272/ebMS_v2_0.pdf] the next paragraph 6.5.5 is also relevant "Resending Acknowledgements": If the Receiving MSH receives a message it discovers to be a duplicate, it should resend the original cknowledgment Message if the message is stored in persistent store. In this case, do the following: Look in persistent storage for the first response to the received message (i.e. it contains a RefToMessageId that matches the MessageId of the received message). If a response message was found in persistent storage then resend the persisted message back to the MSH that sent the received message. If no response message was found in persistent storage, then: (1) If syncReplyMode is not set to none and if the CPA indicates an application response is included, then it must be the case that the application has not finished processing the earlier copy of the same message. Therefore, wait for the response from the application and then return that response synchronously over the same connection that was used for the retransmission. (2) Otherwise, generate an Acknowledgment Message.

In other words, if you receive a message that you received before, you must always acknowledge the message.

I performed the following two test cases with the fix suggested by EdwinVDS included.

CASE-1 A sends message-1 to B requesting acknowledgement. B sends acknowledgement-1 but the acknowledgement fails to be delivered. Later on A retries message-1. B is expected to send acknowledgement-1. RESULT: B indeed sends acknowledgement-1.

CASE-2 A sends message-2 to B requesting acknowledgement. B sends acknowledgement-2 which is delivered successfully. Later on, A sends message-2 again. B is expected to send acknowledgement-2. RESULT: B does not send acknowledgement-2.

The behavior as observed in CASE-1 now complies to the ebMS v2.0 spec, thanks to the proposed fix. But with respect to the scenario described in CASE-2 Jentrata-MSH still doesn't comply to ebMS v2.0.

cavycorp commented 5 years ago

@aaronwalker we had an occurrence of this problem in our production environment. First ACK wasn't received by te sender of the message due to a configuration error. As a result the sender re-send the original message, Jentrata identified it as a duplicate and as a result looked up the original ACK and changed its status to PENDING. After which nothing happened. Also the option of resending a message via the Jentrata Administration pages, which does almost the same, changing the status of the existing message to PENDING results in no action.

So I dived in the code and find out that this behaviour exist for quite some time, actually it started after the introduction of the "select untried message fix" which extended the search on pending messages with a join between the message and outbox table. In both cases as described above only the message table is updated, setting the status to PENDING but no entry is made to the outbox table. As a result of the resend messages and ACKs won't be part of the result of the query performed by the OutboxCollector process. A simple fix would be to reverse the "select untried message fix", however the Cluster code within Jentrata is also depending on the outbox logic.

As a result I went through to code to find all methods in which the message status of an existing message is changed in PENDING. And then checked if they did the correct updates in the message as well as the outbox table. In the end I found three classes in which the updates we're incomplete:

With some small fixes in these files the re-send of messages and ACKs worked again. I will get these changes to you asap.

aaronwalker commented 5 years ago

@cavycorp @EdwinVDS @dannykool very happy to review any PR request and merge this fix in ASAP

aaronwalker commented 5 years ago

fix implemented by #55