quickfix-j / quickfixj

QuickFIX/J is a full featured messaging engine for the FIX protocol. - This is the official project repository.
http://www.quickfixj.org
Other
966 stars 618 forks source link

Parsing an invalid FIX message could cause infinite loop and OutOfMemory #432

Closed amarchanka closed 2 years ago

amarchanka commented 3 years ago

Out of memory Client send us enormous amount of hearbeats. org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl messageQueue in the heapdump contains 9854469 unprocessed heartbeats Also SimpleBufferAllocator$SimpleBuffer contains unprocessed heartbeats 8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38046.10=060..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38047.10=061..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.318.34=38048.10=062..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38049.10=064..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38050.10=056..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38051.10=057..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38052.10=058..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38053.10=059..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=20211021-21:44:48.319.34=38054.10=060..8=FIX.4.4.9=082.35=0.49=xxxxxxxxxxxx.xxxxxxxx.56=cServer.57=QUOTE.52=202

Verification of sequence number is disabled, because we had OOM in case when client send us wrong sequence number. The messageQueue contains repeated heartbeats. It looks like client send 20 generated heartbeats many times and kill our application in 1-2 minutes.

Some packages had mistake like Length format error in message (last character: =): 8=FIX.4.4.9=8=FIX.4.4.9=08

Step to reproduce

` SessionID currentSessionId = (SessionID) ReflectionUtils.getField(field, connectionInitiator);

    final Session session = Session.lookupSession(currentSessionId);

    MemoryPoolMXBean mxBean = ManagementFactory.getMemoryPoolMXBeans().stream()
            .filter((MemoryPoolMXBean memoryPoolMXBean) -> memoryPoolMXBean.getName().equals("G1 Old Gen"))
            .findFirst().orElseThrow();
    long counter = 0;
    Heartbeat heartbeat = new Heartbeat();
    List<String> out = new ArrayList<>();
    for(int i = 0; i < 20; i++) {
        heartbeat.getHeader().removeField(PossDupFlag.FIELD);
        heartbeat.getHeader().removeField(OrigSendingTime.FIELD);
        initializeHeader(heartbeat.getHeader(), currentSessionId, i + 2);
        out.add(heartbeat.toString());
    }

    StringBuilder builder = new StringBuilder();
    String s = heartbeat.toString().substring(0, 13);
    for(int i = 0; i < 100; i++) {
        builder.append(s);
    }
    out.add(s.toString());

    Responder responder = session.getResponder();

    while (true) {
        if (mxBean.getUsage().getUsed() < 128L * 1024 * 1024) {
            if (session.isLoggedOn()) {

                for (String v : out) {
                    responder.send(v);
                }
            } else {
                break;
            }
        } else {
            System.gc();
        }
    }
    LOGGER.info("Counter {}", counter);
}

private static void initializeHeader(Message.Header header, SessionID sessionID, int seq) {
    header.setString(BeginString.FIELD, sessionID.getBeginString());
    header.setString(SenderCompID.FIELD, sessionID.getSenderCompID());
    optionallySetID(header, SenderSubID.FIELD, sessionID.getSenderSubID());
    optionallySetID(header, SenderLocationID.FIELD, sessionID.getSenderLocationID());
    header.setString(TargetCompID.FIELD, sessionID.getTargetCompID());
    optionallySetID(header, TargetSubID.FIELD, sessionID.getTargetSubID());
    optionallySetID(header, TargetLocationID.FIELD, sessionID.getTargetLocationID());
    header.setInt(MsgSeqNum.FIELD, seq);
    insertSendingTime(header);
}

private static void optionallySetID(Header header, int field, String value) {
    if (!value.equals(SessionID.NOT_SET)) {
        header.setString(field, value);
    }
}

private static void insertSendingTime(Message.Header header) {
    header.setUtcTimeStamp(SendingTime.FIELD, SystemTime.getLocalDateTime(), UtcTimestampPrecision.SECONDS);
}

}`

Expected behavior Is it possible to add some rate limiter for income packages? Add limit on broken messages on transport layer? I tried SocketAcceptor socketAcceptor = SocketAcceptor.newBuilder() .withApplication(application) .withLogFactory(logFactory) .withMessageFactory(getMessageFactory()) .withMessageStoreFactory(createMessageStoreFactory(settings)) .withQueueWatermarks(10, 10000) .withSettings(settings) .build(); but it wouldn't help. system information:

chrjohn commented 3 years ago

Did you try with a newer version than 2.1.1? Don't have anything special in mind right now regarding performance problems in that version but maybe it helps. However, I could imagine that you could crash almost any application given a high enough amount of messages and a low amount of memory. How large is your heap sized?

Maybe the counterparty could fix their heartbeat logic? ;) I think if you set the watermarks low enough then it should help because messages will only be read from the socket when the lower watermark is reached. This of course could lead to a disconnection when the socket buffer is filled up.

You could implement some kind of safeguard against this by counting messages in your fromAdmin or fromApp callback and disconnect if a certain amount in a certain time is exceeded.

amarchanka commented 3 years ago

Did you try with a newer version than 2.1.1? Don't have anything special in mind right now regarding performance problems in that version but maybe it helps. However, I could imagine that you could crash almost any application given a high enough amount of messages and a low amount of memory. How large is your heap sized?

2Gb

Maybe the counterparty could fix their heartbeat logic? ;) I think if you set the watermarks low enough then it should help because messages will only be read from the socket when the lower watermark is reached. This of course could lead to a disconnection when the socket buffer is filled up.

It is public service, so everybody can do DOS attack.

You could implement some kind of safeguard against this by counting messages in your fromAdmin or fromApp callback and disconnect if a certain amount in a certain time is exceeded.

I checked Session.java and it doesn't contains any calls for Heartbeats on Application interface.

Our admin solve problem by limiting income package per second on iptables.

akrus commented 3 years ago

We have a similar problem on 2.2.0, but the reality is that there is a bug causing QuickFixJ to go into infinite loop.

To reproduce this you need to do the following:

  1. Send correct logon message, establish the connection.
  2. Send broken FIX message twice: e.g.
    8=FIX.4.4\x019=058=\x01

    If you send this message once, only one log entry appears and everything works fine. If you send this message second time within same session, you'll have a loop with this message: Length format error in message (last character: =) until you run out of memory.

chrjohn commented 3 years ago

@akrus OK, I was not aware of that. Actually I thought that after some kilobyte of garbage data the message parsing should start from the beginning. But maybe there is a gap in the processing logic. Are you able to supply a reproducer unit test? It seems that class https://github.com/quickfix-j/quickfixj/blob/master/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java is affected. Otherwise I will try to put one together later. Thanks, Chris.

akrus commented 3 years ago

@chrjohn , unfortunately I don't have anything to share, it's just an app we have running there which is based on QuickfixJ.

amarchanka commented 3 years ago

Unit test for reproduction FIXMessageDecoderTest It is in a loop forever. @Test public void testInvalidDecoding() throws Exception { decoder = new FIXMessageDecoder("UTF-16"); setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128=FIX.4.2\0019="); MessageDecoderResult decoderResult = decoder.decode(null, buffer, decoderOutput); assertEquals("wrong decoder result", MessageDecoderResult.OK, decoderResult); assertEquals("Wrong encoding", 14397, (int) decoderOutput.getMessage().charAt(0)); }

chrjohn commented 3 years ago

Thanks. I got the issue reproduced and fixed in the meantime. Will still run some tests. Will also check that it works with your unit test.

amarchanka commented 3 years ago

FIXMessageDecoder.patch.txt You can check my fix also. Unit test updated. ` @Test public void testInvalidDecoding() throws Exception { decoder = new FIXMessageDecoder("UTF-16"); setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128="

chrjohn commented 3 years ago

My fix only was

handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString,
                                 false);

So I only put position instead of in.position() + 1. Do you think the other changes are needed? Why should we skip one more character?

My unit test works with your message as well but your unit test does not seem to work with your proposed fix?!

I got: java.lang.AssertionError: wrong decoder result expected:<OK> but was:<NEED_DATA> Or were you just trying to show that it does not get into an infinite loop?

amarchanka commented 3 years ago

My fix only was

handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString,
                                 false);

So I only put position instead of in.position() + 1. Do you think the other changes are needed? Why should we skip one more character?

My unit test works with your message as well but your unit test does not seem to work with your proposed fix?!

I got: java.lang.AssertionError: wrong decoder result expected:<OK> but was:<NEED_DATA> Or were you just trying to show that it does not get into an infinite loop? Please update unit test.

@Test
public void testInvalidDecoding() throws Exception {
decoder = new FIXMessageDecoder("UTF-16");
setUpBuffer("8=FIX.4.2\0019=128=FIX.4.2\0019=8=FIX.4.2\0019=128="
+ "FIX.4.2\0019=8=FIX.4.2\0019=12\00135=X\001108=30\00110=049\001");
MessageDecoderResult decoderResult = decoder.decode(null, buffer, decoderOutput);
assertEquals("wrong decoder result", MessageDecoderResult.OK, decoderResult);
assertEquals("Wrong encoding", 14397, (int) decoderOutput.getMessage().charAt(0));
}
chrjohn commented 3 years ago

That test passes with my changes, i.e. only change to:

handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString, false);

P.S.: It also passes with your changes, but I don't think we need the extra changes that you did. Do they have a specific purpose? What do you think?

amarchanka commented 3 years ago

That test passes with my changes, i.e. only change to:

handleError(in, position, "Length format error in message (last character: " + (char)ch + "): " + messageString, false);

When fix will be available?

chrjohn commented 3 years ago

I need to build a release which incorporates the fix. I think it is sufficient to fix it on 2.2.x and 2.3.x. You are still using 2.1.1. Any chance to upgrade?

amarchanka commented 3 years ago

I need to build a release which incorporates the fix. I think it is sufficient to fix it on 2.2.x and 2.3.x. You are still using 2.1.1. Any chance to upgrade?

It is good reason to update. Thanks a lot!

chrjohn commented 2 years ago

Integrated into master branch now, will merge to 2.2.x and 2.3.x branches and create releases in due course.

chrjohn commented 2 years ago

Release 2.2.1 has been published and should appear on Maven Central in the next hours. 2.3.1 is currently in progress.

chrjohn commented 2 years ago

Just checked that 2.2.1 and 2.3.1 are available via Maven Central now.

chrjohn commented 2 years ago

@amarchanka @akrus Thanks for bringing this to my attention. When reading the original bug report it was not clear to me that the faulty messages were actually the culprit and not the Heartbeats. I hope your application is now performing better.

akrus commented 2 years ago

@chrjohn, thanks, we've updated to 2.2.1 and works fine so far.

amarchanka commented 2 years ago

@amarchanka @akrus Thanks for bringing this to my attention. When reading the original bug report it was not clear to me that the faulty messages were actually the culprit and not the Heartbeats. I hope your application is now performing better.

Fix work fine for me. BUT here another usage of handleError(in, in.position() + 1, So in case when length start from non digit we will have the same situation.

chrjohn commented 2 years ago

Hi @amarchanka , I just put together a simple unit test but couldn't provoke a similar error.

    @Test(timeout = 1000)
    public void testLengthFormatError3() throws Exception {
        String badMessages = "8=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\001";
        String goodMessage = "8=FIX.4.4\0019=12\00135=Y\001108=30\00110=037\001";
        setUpBuffer(badMessages + goodMessage + badMessages + goodMessage);
        assertMessageFound(goodMessage, 2);
    }

The difference to the bug is that in case of handleError for a non-digit the loop is not continued: https://github.com/quickfix-j/quickfixj/blob/2f4fda1f0dbcfd1185cf01533b0a614e00808478/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java#L150-L158 and eventually gets to the position where the fix has been applied. https://github.com/quickfix-j/quickfixj/blob/2f4fda1f0dbcfd1185cf01533b0a614e00808478/quickfixj-core/src/main/java/quickfix/mina/message/FIXMessageDecoder.java#L171-L173

Or did I misunderstand something? Are you able to provide a failing test?

Thanks, Chris.

amarchanka commented 2 years ago

Please try

    @Test(timeout = 1000)
    public void testLengthFormatError3() throws Exception {
        String badMessages = "8=FIX.4.4\0019=0K2\0018=FIX.4.4\0019=02A\0018=FIX.4.4\0019=02A\001";
        String goodMessage = "8=FIX.4.4\0019=12\00135=Y\001108=30\00110=037\001";
        setUpBuffer(badMessages + goodMessage + badMessages + goodMessage);
        assertMessageFound(goodMessage, 2);
    }

I didn't check it.

chrjohn commented 2 years ago
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running quickfix.mina.message.FIXMessageDecoderTest
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Encountered invalid body length: K
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=0378=FIX.4.49=0K28=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=0378=FIX.4.49=0K28=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Encountered invalid body length: K
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Nov 23, 2021 1:32:09 PM quickfix.mina.message.FIXMessageDecoder handleError
SEVERE: Length format error in message (last character: A): 8=FIX.4.49=02A8=FIX.4.49=1235=Y108=3010=037
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.102 s - in quickfix.mina.message.FIXMessageDecoderTest

Results:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Works :+1: