quickfix-j / quickfixj

QuickFIX/J is a full featured messaging engine for the FIX protocol. - This is the official project repository.
http://www.quickfixj.org
Other
955 stars 611 forks source link

`JdbcStore.storeSequenceNumbers` throws exception that is never recovered from #613

Open techdiver opened 1 year ago

techdiver commented 1 year ago

Describe the bug A configured quickfix connection throws an exception whilst processing a heartbeat response.

To Reproduce Cannot reproduce locally. Issue reported on client system.

Expected behavior Fix session establish correctly

system information:

Additional context Initially we get the following exception thrown:

00:33:08.349,ERROR,errorEvent,QFJ Message Processor,[],Closed Connection
java.sql.SQLRecoverableException: Closed Connection
    at oracle.jdbc.driver.PhysicalConnection.needLine(PhysicalConnection.java:3247)
    at oracle.jdbc.driver.OracleStatement.closeOrCache(OracleStatement.java:1392)
    at oracle.jdbc.driver.OracleStatement.close(OracleStatement.java:1375)
    at oracle.jdbc.driver.OracleStatementWrapper.close(OracleStatementWrapper.java:119)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.close(OraclePreparedStatementWrapper.java:96)
    at org.logicalcobwebs.proxool.AbstractProxyStatement.close(AbstractProxyStatement.java:115)
    at org.logicalcobwebs.proxool.ProxyStatement.invoke(ProxyStatement.java:93)
    at org.logicalcobwebs.proxool.ProxyStatement.intercept(ProxyStatement.java:57)
    at $java.sql.Statement$$EnhancerByProxool$$10c0cac9.close(<generated>)
    at quickfix.JdbcUtil.close(JdbcUtil.java:132)
    at quickfix.JdbcStore.storeSequenceNumbers(JdbcStore.java:286)
    at quickfix.JdbcStore.setNextTargetMsgSeqNum(JdbcStore.java:270)
    at quickfix.JdbcStore.incrNextTargetMsgSeqNum(JdbcStore.java:173)
    at quickfix.SessionState.incrNextTargetMsgSeqNum(SessionState.java:372)
    at quickfix.Session.nextHeartBeat(Session.java:1739)
    at quickfix.Session.next(Session.java:1053)
    at quickfix.Session.next(Session.java:1228)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:163)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:113)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.lambda$blockInThread$1(SingleThreadedEventHandlingStrategy.java:145)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$ThreadAdapter$RunnableWrapper.run(SingleThreadedEventHandlingStrategy.java:267)
    at java.base/java.lang.Thread.run(Thread.java:832)

Followed by:

0:33:08.363,ERROR,errorEvent,QFJ Message Processor,[],Error processing message: 8=FIXT.1.19=7235=034=753649=<sender>56=<target>52=20230222-00:33:07.98610=138
java.io.IOException: IO Error: Socket read interrupted
    at quickfix.JdbcStore.storeSequenceNumbers(JdbcStore.java:284)
    at quickfix.JdbcStore.setNextTargetMsgSeqNum(JdbcStore.java:270)
    at quickfix.JdbcStore.incrNextTargetMsgSeqNum(JdbcStore.java:173)
    at quickfix.SessionState.incrNextTargetMsgSeqNum(SessionState.java:372)
    at quickfix.Session.nextHeartBeat(Session.java:1739)
    at quickfix.Session.next(Session.java:1053)
    at quickfix.Session.next(Session.java:1228)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:163)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:113)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.lambda$blockInThread$1(SingleThreadedEventHandlingStrategy.java:145)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$ThreadAdapter$RunnableWrapper.run(SingleThreadedEventHandlingStrategy.java:267)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.sql.SQLRecoverableException: IO Error: Socket read interrupted
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:932)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1119)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3780)
    at oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1343)
    at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3887)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1079)
    at jdk.internal.reflect.GeneratedMethodAccessor1050.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.logicalcobwebs.proxool.ProxyStatement.invoke(ProxyStatement.java:100)
    at org.logicalcobwebs.proxool.ProxyStatement.intercept(ProxyStatement.java:57)
    at $java.sql.Statement$$EnhancerByProxool$$10c0cac9.execute(<generated>)
    at quickfix.JdbcStore.storeSequenceNumbers(JdbcStore.java:282)
    ... 11 more
Caused by: java.net.SocketTimeoutException: Socket read interrupted
    at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:152)
    at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
    at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
    at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
    at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
    at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
    at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:59)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:910)
    ... 23 more
Cause: IO Error: Socket read interrupted
java.sql.SQLRecoverableException: IO Error: Socket read interrupted
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:932)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1119)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3780)
    at oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1343)
    at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3887)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1079)
    at jdk.internal.reflect.GeneratedMethodAccessor1050.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.logicalcobwebs.proxool.ProxyStatement.invoke(ProxyStatement.java:100)
    at org.logicalcobwebs.proxool.ProxyStatement.intercept(ProxyStatement.java:57)
    at $java.sql.Statement$$EnhancerByProxool$$10c0cac9.execute(<generated>)
    at quickfix.JdbcStore.storeSequenceNumbers(JdbcStore.java:282)
    at quickfix.JdbcStore.setNextTargetMsgSeqNum(JdbcStore.java:270)
    at quickfix.JdbcStore.incrNextTargetMsgSeqNum(JdbcStore.java:173)
    at quickfix.SessionState.incrNextTargetMsgSeqNum(SessionState.java:372)
    at quickfix.Session.nextHeartBeat(Session.java:1739)
    at quickfix.Session.next(Session.java:1053)
    at quickfix.Session.next(Session.java:1228)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:163)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:113)
    at quickfix.mina.SingleThreadedEventHandlingStrategy.lambda$blockInThread$1(SingleThreadedEventHandlingStrategy.java:145)
    at quickfix.mina.SingleThreadedEventHandlingStrategy$ThreadAdapter$RunnableWrapper.run(SingleThreadedEventHandlingStrategy.java:267)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.SocketTimeoutException: Socket read interrupted
    at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:152)
    at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
    at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
    at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
    at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
    at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
    at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:59)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:910)
    ... 23 more

00:34:20.038,ERROR,errorEvent,QFJ Timer,[],Disconnecting: Timed out waiting for heartbeat

The client reports no connectivity issues at the time and this is backed up by the fact that other resources within our application can communicate with the database fine. Thee might have been a disconnect but the assumption is quickfixj would reconnect once available. Aside from that it seems that reconnect attempts don't work as we see many of these in the logs:

00:34:31.037,ERROR,errorEvent,QFJ Timer,[],Disconnecting: Timed out waiting for logon response

This is despite the fact that the fact that we can see from the logs that those logon attempts are being responded to:

00:34:21.073,INFO ,message,QFJ Timer,[],8=FIXT.1.19=9135=A34=804949=application_name52=20230222-00:34:21.03156=application_name298=0108=301137=710=224
00:34:21.110,INFO ,message,NioProcessor-23,[],8=FIXT.1.19=9135=A34=753949=application_name256=application_name52=20230222-00:34:21.07598=01137=7108=3010=235
00:39:22.054,INFO ,message,QFJ Timer,[],8=FIXT.1.19=9135=A34=805049=application_name52=20230222-00:39:22.04156=application_name298=0108=301137=710=223
00:39:22.070,INFO ,message,NioProcessor-27,[],8=FIXT.1.19=9135=A34=754049=application_name256=application_name52=20230222-00:39:22.06398=01137=7108=3010=230
00:44:22.139,INFO ,message,QFJ Timer,[],8=FIXT.1.19=9135=A34=805149=application_name52=20230222-00:44:22.13156=application_name298=0108=301137=710=220
00:44:22.190,INFO ,message,NioProcessor-28,[],8=FIXT.1.19=9135=A34=754149=application_name256=application_name52=20230222-00:44:22.14098=01137=7108=3010=223
chrjohn commented 1 year ago

Hard to tell what causes this but the logon and heartbeat timeouts look like a thread is blocked that would normally process these messages. A thread dump would be helpful.

chrjohn commented 1 year ago

BTW you could also use another connection pool instead of Proxool. Also see here #607