quickfix-j / quickfixj

QuickFIX/J is a full featured messaging engine for the FIX protocol. - This is the official project repository.
http://www.quickfixj.org
Other
955 stars 611 forks source link

Stop sending resend-messages when the responder has gone away #646

Open philipwhiuk opened 1 year ago

philipwhiuk commented 1 year ago

We should probably stop resending messages when the responder has disconnected.

The current behavior is to keep sending messages regardless, ignoring the return value of send().

Here's an anonymised log example.

event INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: Resending message: 3
outgoing INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: <MESSAGE>
event INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: Resending message: 4
outgoing INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: <MESSAGE>
event INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: Resending message: 5
outgoing INFO 20230502-18:15:40.420 - FIXT.1.1:A->B: <MESSAGE>
errorEvent ERROR 20230502-18:15:40.522 - FIXT.1.1:A->B: Disconnecting: Socket exception (/<IP>:<PORT>): java.io.IOException: Connection reset by peer
event INFO 20230502-18:15:40.522 - FIXT.1.1:A->B: Resending message: 6
event INFO 20230502-18:15:40.522 - FIXT.1.1:A->B: No responder, not sending message: <MESSAGE>
event INFO 20230502-18:15:40.741 - FIXT.1.1:A->B: Resending message: 7
event INFO 20230502-18:15:40.741 - FIXT.1.1:A->B: No responder, not sending message: <MESSAGE>
...
event INFO 20230502-18:15:44.741 - FIXT.1.1:A->B: Resending message: 1000
event INFO 20230502-18:15:44.741 - FIXT.1.1:A->B: No responder, not sending message: <MESSAGE>
philipwhiuk commented 1 year ago

I think best fix for this is to return if send returns false, and not send the rest of the resend here: https://github.com/quickfix-j/quickfixj/blob/dce91e62c7525b84448b974b796a1fac436015ae/quickfixj-core/src/main/java/quickfix/Session.java#L2378

But I'd welcome a second opinion.

chrjohn commented 1 year ago

Hi @philipwhiuk , thanks for opening the issue. I think it would be a very sensible enhancement.

AFAIK we do not act upon the return value of the send() method anywhere since it is a leftover from the JNI compatibility and messages are sent async anyway (except if SocketSynchronousWrites is false).

From the line of code that you quoted, we ultimately end up here when the responder is not null:

https://github.com/quickfix-j/quickfixj/blob/dce91e62c7525b84448b974b796a1fac436015ae/quickfixj-core/src/main/java/quickfix/Session.java#L2774

and from there to here:

https://github.com/quickfix-j/quickfixj/blob/dce91e62c7525b84448b974b796a1fac436015ae/quickfixj-core/src/main/java/quickfix/mina/IoSessionResponder.java#L51-L76

This method might also return false in two cases but I think it would be OK to abort the resend process in both cases since we can assume that the connection is either broken (and disconnected anyway) or the sync write timed out for one message and it would not make sense to resend the next message since at least one message was skipped over which subsequently would lead to another gap on the other side.

So in general it looks good to me. 👍

P.S.: Maybe we should open a follow-up issue to check if it is sensible to act upon the return value of send() in other cases.