Closed amarchanka closed 2 years ago
Did you try with a newer version than 2.1.1? Don't have anything special in mind right now regarding performance problems in that version but maybe it helps. However, I could imagine that you could crash almost any application given a high enough amount of messages and a low amount of memory. How large is your heap sized?
Maybe the counterparty could fix their heartbeat logic? ;) I think if you set the watermarks low enough then it should help because messages will only be read from the socket when the lower watermark is reached. This of course could lead to a disconnection when the socket buffer is filled up.
You could implement some kind of safeguard against this by counting messages in your fromAdmin
or fromApp
callback and disconnect if a certain amount in a certain time is exceeded.
Did you try with a newer version than 2.1.1? Don't have anything special in mind right now regarding performance problems in that version but maybe it helps. However, I could imagine that you could crash almost any application given a high enough amount of messages and a low amount of memory. How large is your heap sized?
2Gb
Maybe the counterparty could fix their heartbeat logic? ;) I think if you set the watermarks low enough then it should help because messages will only be read from the socket when the lower watermark is reached. This of course could lead to a disconnection when the socket buffer is filled up.
It is public service, so everybody can do DOS attack.
You could implement some kind of safeguard against this by counting messages in your
fromAdmin
orfromApp
callback and disconnect if a certain amount in a certain time is exceeded.
I checked Session.java and it doesn't contains any calls for Heartbeats on Application interface.
Our admin solve problem by limiting income package per second on iptables.
We have a similar problem on 2.2.0, but the reality is that there is a bug causing QuickFixJ to go into infinite loop.
To reproduce this you need to do the following:
8=FIX.4.4\x019=058=\x01
If you send this message once, only one log entry appears and everything works fine. If you send this message second time within same session, you'll have a loop with this message: Length format error in message (last character: =) until you run out of memory.
@akrus OK, I was not aware of that. Actually I thought that after some kilobyte of garbage data the message parsing should start from the beginning. But maybe there is a gap in the processing logic. Are you able to supply a reproducer unit test? It seems that class https://github.com/quickfix-j/quickfixj/blob/master/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java is affected. Otherwise I will try to put one together later. Thanks, Chris.
@chrjohn , unfortunately I don't have anything to share, it's just an app we have running there which is based on QuickfixJ.
Unit test for reproduction FIXMessageDecoderTest It is in a loop forever.
@Test public void testInvalidDecoding() throws Exception { decoder = new FIXMessageDecoder("UTF-16"); setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128=FIX.4.2\0019="); MessageDecoderResult decoderResult = decoder.decode(null, buffer, decoderOutput); assertEquals("wrong decoder result", MessageDecoderResult.OK, decoderResult); assertEquals("Wrong encoding", 14397, (int) decoderOutput.getMessage().charAt(0)); }
Thanks. I got the issue reproduced and fixed in the meantime. Will still run some tests. Will also check that it works with your unit test.
FIXMessageDecoder.patch.txt You can check my fix also. Unit test updated. ` @Test public void testInvalidDecoding() throws Exception { decoder = new FIXMessageDecoder("UTF-16"); setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128="
My fix only was
handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString,
false);
So I only put position
instead of in.position() + 1
. Do you think the other changes are needed? Why should we skip one more character?
My unit test works with your message as well but your unit test does not seem to work with your proposed fix?!
I got: java.lang.AssertionError: wrong decoder result expected:<OK> but was:<NEED_DATA>
Or were you just trying to show that it does not get into an infinite loop?
My fix only was
handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString, false);
So I only put
position
instead ofin.position() + 1
. Do you think the other changes are needed? Why should we skip one more character?My unit test works with your message as well but your unit test does not seem to work with your proposed fix?!
I got:
java.lang.AssertionError: wrong decoder result expected:<OK> but was:<NEED_DATA>
Or were you just trying to show that it does not get into an infinite loop? Please update unit test.@Test public void testInvalidDecoding() throws Exception { decoder = new FIXMessageDecoder("UTF-16"); setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128=" + "FIX.4.2\0019=8=FIX.4.2\0019=12\00135=X\001108=30\00110=049\001"); MessageDecoderResult decoderResult = decoder.decode(null, buffer, decoderOutput); assertEquals("wrong decoder result", MessageDecoderResult.OK, decoderResult); assertEquals("Wrong encoding", 14397, (int) decoderOutput.getMessage().charAt(0)); }
That test passes with my changes, i.e. only change to:
handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString, false);
P.S.: It also passes with your changes, but I don't think we need the extra changes that you did. Do they have a specific purpose? What do you think?
That test passes with my changes, i.e. only change to:
handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString, false);
When fix will be available?
I need to build a release which incorporates the fix. I think it is sufficient to fix it on 2.2.x and 2.3.x. You are still using 2.1.1. Any chance to upgrade?
I need to build a release which incorporates the fix. I think it is sufficient to fix it on 2.2.x and 2.3.x. You are still using 2.1.1. Any chance to upgrade?
It is good reason to update. Thanks a lot!
Integrated into master branch now, will merge to 2.2.x and 2.3.x branches and create releases in due course.
Release 2.2.1 has been published and should appear on Maven Central in the next hours. 2.3.1 is currently in progress.
Just checked that 2.2.1 and 2.3.1 are available via Maven Central now.
@amarchanka @akrus Thanks for bringing this to my attention. When reading the original bug report it was not clear to me that the faulty messages were actually the culprit and not the Heartbeats. I hope your application is now performing better.
@chrjohn, thanks, we've updated to 2.2.1 and works fine so far.
@amarchanka @akrus Thanks for bringing this to my attention. When reading the original bug report it was not clear to me that the faulty messages were actually the culprit and not the Heartbeats. I hope your application is now performing better.
Fix work fine for me. BUT here another usage of handleError(in, in.position() + 1,
So in case when length start from non digit we will have the same situation.
Hi @amarchanka , I just put together a simple unit test but couldn't provoke a similar error.
@Test(timeout = 1000)
public void testLengthFormatError3() throws Exception {
String badMessages = "8=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\001";
String goodMessage = "8=FIX.4.4\0019=12\00135=Y\001108=30\00110=037\001";
setUpBuffer(badMessages + goodMessage + badMessages + goodMessage);
assertMessageFound(goodMessage, 2);
}
The difference to the bug is that in case of handleError
for a non-digit the loop is not continued:
https://github.com/quickfix-j/quickfixj/blob/2f4fda1f0dbcfd1185cf01533b0a614e00808478/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java#L150-L158
and eventually gets to the position where the fix has been applied.
https://github.com/quickfix-j/quickfixj/blob/2f4fda1f0dbcfd1185cf01533b0a614e00808478/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java#L171-L173
Or did I misunderstand something? Are you able to provide a failing test?
Thanks, Chris.
Please try
@Test(timeout = 1000)
public void testLengthFormatError3() throws Exception {
String badMessages = "8=FIX.4.4\0019=0K2\0018=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\001";
String goodMessage = "8=FIX.4.4\0019=12\00135=Y\001108=30\00110=037\001";
setUpBuffer(badMessages + goodMessage + badMessages + goodMessage);
assertMessageFound(goodMessage, 2);
}
I didn't check it.
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running quickfix.mina.message.FIXMessageDecoderTest
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Encountered invalid body length: K
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=0378=FIX.4.49=0K28=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=0378=FIX.4.49=0K28=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Encountered invalid body length: K
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.102 s - in quickfix.mina.message.FIXMessageDecoderTest
Results:
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
Works :+1:
Out of memory Client send us enormous amount of hearbeats. org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl messageQueue in the heapdump contains 9854469 unprocessed heartbeats Also SimpleBufferAllocator$SimpleBuffer contains unprocessed heartbeats
8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38046.10=060..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38047.10=061..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38048.10=062..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38049.10=064..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38050.10=056..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38051.10=057..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38052.10=058..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38053.10=059..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38054.10=060..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=202
Verification of sequence number is disabled, because we had OOM in case when client send us wrong sequence number. The messageQueue contains repeated heartbeats. It looks like client send 20 generated heartbeats many times and kill our application in 1-2 minutes.
Some packages had mistake like
Length format error in message (last character: =): 8=FIX.4.4.9=8=FIX.4.4.9=08
Step to reproduce
` SessionID currentSessionId = (SessionID) ReflectionUtils.getField(field, connectionInitiator);
}`
Expected behavior Is it possible to add some rate limiter for income packages? Add limit on broken messages on transport layer? I tried
SocketAcceptor socketAcceptor = SocketAcceptor.newBuilder() .withApplication(application) .withLogFactory(logFactory) .withMessageFactory(getMessageFactory()) .withMessageStoreFactory(createMessageStoreFactory(settings)) .withQueueWatermarks(10, 10000) .withSettings(settings) .build();
but it wouldn't help. system information: