real-logic / simple-binary-encoding

Simple Binary Encoding (SBE) - High Performance Message Codec
Apache License 2.0
3.13k stars 523 forks source link

OtfMessageDecoder doesn't report the BEGIN_COMPOSITE token in TokenListener #471

Closed mikeb01 closed 7 years ago

mikeb01 commented 7 years ago

When using the OftMessageDecoder, which the TokenListener.onBeginComposite() callback the fieldToken supplied is the BEGIN_FIELD token for the field of the message that the decoder is currently at. However it would be useful if the BEGIN_COMPOSITE token was also available. Otherwise it is difficult to figure out where within a nested composite the value that you have arrived at is contained. See below for an example:

Given the following definition, with a nested composite.

<?xml version="1.0" encoding="UTF-8"?>
<sbe:messageSchema
        xmlns:sbe="http://fixprotocol.io/2016/sbe"
        package="sbeir.bug"
        id="10001"
        version="209"
        description="Order Book Events persistent structure"
        byteOrder="littleEndian">

    <types>
        <composite name="messageHeader" description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <composite name="AccountKey">
            <type name="id" primitiveType="int64"/>
            <type name="tradingMemberId" primitiveType="int32"/>
            <enum name="tradingMemberType" encodingType="char">
                <validValue name="GENERAL">G</validValue>
                <validValue name="BROKER">B</validValue>
                <validValue name="BANK">K</validValue>
                <validValue name="UNKNOWN">U</validValue>
            </enum>
        </composite>
        <composite name="EvOrderState">
            <ref name="account" type="AccountKey"/>
        </composite>
    </types>

    <sbe:message name="ExecutionReport" id="3">
        <field name="orderState" id="8" type="EvOrderState"/>
    </sbe:message>

</sbe:messageSchema>

If I print out the information about the location of the various encodings, using the code below the name of the field token supplied will always been the name of the field of the message, making it difficult to determine if the current encoding is from an AccountKey composite type.

@Test
public void test() throws IOException
{
    final long accountId = 98273447;
    final int tradingMemberId = 9823745;

    UnsafeBuffer buffer = new UnsafeBuffer(new byte[4096]);
    MessageHeaderEncoder headerEncoder = new MessageHeaderEncoder();
    ExecutionReportEncoder executionReportEncoder = new ExecutionReportEncoder();

    executionReportEncoder.wrapAndApplyHeader(buffer, 0, headerEncoder);
    executionReportEncoder.orderState().account().id(accountId);
    executionReportEncoder.orderState().account().tradingMemberId(tradingMemberId);
    executionReportEncoder.orderState().account().tradingMemberType(TradingMemberType.GENERAL);

    final ByteBuffer byteBuffer = readOrderBookEventsIr("/bug.sbeir");

    Ir orderBookEventsIr = new IrDecoder(byteBuffer).decode();
    OtfHeaderDecoder headerDecoder = new OtfHeaderDecoder(orderBookEventsIr.headerStructure());

    final int blockLength = headerDecoder.getBlockLength(buffer, 0);
    final int templateId = headerDecoder.getTemplateId(buffer, 0);

    OtfMessageDecoder.decode(
            buffer,
            headerDecoder.encodedLength(),
            209,
            blockLength,
            orderBookEventsIr.getMessage(templateId),
            new AbstractTokenListener()
            {
                private final Deque<String> path = new ArrayDeque<>();

                @Override
                public void onBeginComposite(final Token fieldToken, final List<Token> tokens, final int fromIndex, final int toIndex)
                {
                    System.out.println("fieldToken: " + fieldToken);
                    path.addLast(fieldToken.name());
                }

                @Override
                public void onEndComposite(final Token fieldToken, final List<Token> tokens, final int fromIndex, final int toIndex)
                {
                    path.removeLast();
                }

                @Override
                public void onEncoding(final Token fieldToken, final DirectBuffer buffer, final int bufferIndex, final Token typeToken, final int actingVersion)
                {
                    System.out.printf("%s::%s%n", path, fieldToken.name());
                }
            });

}

Output

fieldToken: Token{signal=BEGIN_FIELD, name='orderState', referencedName='null', description='null', id=8, version=0, deprecated=0, encodedLength=0, offset=0, componentTokenCount=14, encoding=Encoding{presence=REQUIRED, primitiveType=null, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=null, constValue=null, characterEncoding='null', epoch='unix', timeUnit=nanosecond, semanticType='null'}}
fieldToken: Token{signal=BEGIN_FIELD, name='orderState', referencedName='null', description='null', id=8, version=0, deprecated=0, encodedLength=0, offset=0, componentTokenCount=14, encoding=Encoding{presence=REQUIRED, primitiveType=null, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=null, constValue=null, characterEncoding='null', epoch='unix', timeUnit=nanosecond, semanticType='null'}}
[orderState, orderState]::id
[orderState, orderState]::tradingMemberId
mjpt777 commented 7 years ago

@mikeb01 Have you considered using the fromIndex to grab the BEGIN_COMPOSITE token from the list of tokens passed in the callback?

mikeb01 commented 7 years ago

Okay, I see now. Will test on Monday. I'll look to push a change for the wiki to make ir clearer.

mjpt777 commented 7 years ago

The field token is important for when you have multiple fields using the same composite type in a message.

mjpt777 commented 7 years ago

BTW this behaviour is similar for the other types. You get the field token then the valid range of tokens to describe the type.

mikeb01 commented 7 years ago

Yep, I get it now. All of the information is there. The small oddity is having to deal with difference between when you a composite used in a field of a message and when it is used inside of another composite type.

E.g. on the ExecutionReport.orderState field

fieldToken.name = orderState
tokens.get(fromIndex).name = EvOrderState
tokens.get(fromIndex).referencedName = null

Then when at EvOrderState.account

fieldToken.name = orderState
tokens.get(fromIndex).name = account
tokens.get(fromIndex).referencedName = AccountKey

I was hoping to use fieldToken.name all the way down, but it continues to hold the field from the message (should of read the ExampleTokenListener in more detail).

So if you wanted to build a string representation of a path to a particular value, you need to track the whether an encoding is nested inside of a message or another composite to look up either the field or ref as appropriate. And the name of the type moves from the tokens.get(fromIndex).name to tokens.get(fromIndex).referencedName.

I'm guessing this stems from the committee's decision to distinguish between messages and composite types rather than just having types that can be composed.

I'll look at updating the Wiki to describe this scenario.

mjpt777 commented 7 years ago

Yes the main issue here is composites are not really types and kind of a hybrid macro/type. In my view everything should be a type including messages and thus would compose much better. A repeating group should just be another message for example. Setting that aside and we cannot change the past.

It is worth considering Token.applicableName() that takes account for if the name is a reference or the name of the type itself.

mikeb01 commented 7 years ago

That would be really useful. Would this be called on the BEGIN_COMPOSITE (tokens.get(fromIndex)) token or the fieldToken?

mjpt777 commented 7 years ago

@mikeb01 I don't understand the question. Can you try in a different form?

mjpt777 commented 7 years ago

The referencedName is to cover for when the ref type is used in a composite. This way two or more members with the same type can be included but with different names as given in the ref type. The applicableName is the ref name if available otherwise it is the name of the type.

mikeb01 commented 7 years ago

Ignore my question. I was misunderstanding what applicableName would be useful for.