quickfix-j / quickfixj

QuickFIX/J is a full featured messaging engine for the FIX protocol. - This is the official project repository.
http://www.quickfixj.org
Other
955 stars 611 forks source link

Failover mechanism is not working when connection is reset by peer on Initiator #402

Open esanchezros opened 3 years ago

esanchezros commented 3 years ago

Describe the bug We have an initiator configured with 2 acceptors and it connects to them via an sTunnel service running locally:

SocketConnectHost=localhost
SocketConnectPort=44445
SocketConnectHost1=localhost
SocketConnectPort1=44446

If the first acceptor is offline, the initiator keeps on trying the first acceptor and never moves on to the failover ones.

To Reproduce

.l.a.c.f.a.Application                   : Adding logon details. Username TargetCompID
quickfixj.msg.outgoing                   : FIXT.1.1:SenderCompID/TargetCompID->MAIN: 8=FIXT.1.1|9=112|35=A|34=459|49=SenderCompID|50=TargetCompID|52=20210624-09:29:29.788|56=MAIN|98=0|108=30|553=TargetCompID|554=xxxxxxxx|1137=9|10=158|
quickfixj.event                          : FIXT.1.1:SenderCompID/TargetCompID->MAIN: Initiated logon request
quickfixj.errorEvent                     : FIXT.1.1:SenderCompID/TargetCompID->MAIN: Disconnecting: Socket exception (localhost/127.0.0.1:44445): java.io.IOException: Connection reset by peer
l.a.c.f.m.MonitoringSessionStateListener : onDisconnect: session FIXT.1.1:SenderCompID/TargetCompID->MAIN[in:461,out:460] disconnected
l.a.c.f.m.MonitoringSessionStateListener : onLogout: session FIXT.1.1:SenderCompID/TargetCompID->MAIN[in:461,out:460] logged out
l.a.c.f.m.MonitoringSessionStateListener : onConnect: session FIXT.1.1:SenderCompID/TargetCompID->MAIN[in:461,out:460] connected
quickfixj.event                          : FIXT.1.1:SenderCompID/TargetCompID->MAIN: MINA session created: local=/127.0.0.1:49158, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=localhost/127.0.0.1:44445
.l.a.c.f.a.Application                   : Adding logon details. Username TargetCompID
quickfixj.msg.outgoing                   : FIXT.1.1:SenderCompID/TargetCompID->MAIN: 8=FIXT.1.1|9=112|35=A|34=460|49=SenderCompID|50=TargetCompID|52=20210624-09:29:59.785|56=MAIN|98=0|108=30|553=TargetCompID|554=xxxxxxxx|1137=9|10=150|
quickfixj.event                          : FIXT.1.1:SenderCompID/TargetCompID->MAIN: Initiated logon request
quickfixj.errorEvent                     : FIXT.1.1:SenderCompID/TargetCompID->MAIN: Disconnecting: Socket exception (localhost/127.0.0.1:44445): java.io.IOException: Connection reset by peer
l.a.c.f.m.MonitoringSessionStateListener : onDisconnect: session FIXT.1.1:SenderCompID/TargetCompID->MAIN[in:461,out:461] disconnected
l.a.c.f.m.MonitoringSessionStateListener : onLogout: session FIXT.1.1:SenderCompID/TargetCompID->MAIN[in:461,out:461] logged out

Expected behavior I would expect the initiator to switch over to the failover acceptors when a socket connection failure happens. This is true when the IP address/hostname is not resolvable (the failover mechanism works as expected).

system information:

Additional context Is there a way to trigger the failover to the next acceptor programmatically?

chrjohn commented 3 years ago

Hi @esanchezros , thanks for the report and sorry for the delay. Unfortunately there is no way to trigger the failover programatically. This would probably a sensible enhancement since the current failover mechanism is not customizable. However, there were some plans to make it more customizable (e.g. by implementing a custom strategy) but up to now there was no time to do so.

Cheers, Chris.

P.S.: #250 also is related. Some months ago there was a discussion on the mailing list about this topic here: https://sourceforge.net/p/quickfixj/mailman/quickfixj-users/thread/CAFn6DsGzB%3DeFSPt_B0LAZtG_jq%3Dr9HBTSW17aBv90Jrdo4801Q%40mail.gmail.com/#msg37249310