real-logic / simple-binary-encoding

Simple Binary Encoding (SBE) - High Performance Message Codec
Apache License 2.0
3.08k stars 519 forks source link

Value of NULL_VAL generated for enums is not the nullValue specified for encodingType but the null value for the primitive type #889

Closed EEWatanabe closed 2 years ago

EEWatanabe commented 2 years ago

The value of NULL_VAL generated for enums is not the nullValue specified for encodingType but the null value for the primitive type. (Testing with version 1.25.1) For instance:

// SBE Template definition
        <type name="uInt8NULL" primitiveType="uint8" presence="optional" nullValue="0" description="1-byte unsigned integer, from 1 to 255, NULL (optional) value = 0" />
...
        <enum name="LotType" encodingType="uInt8NULL" semanticType="Int" description="Describes the lot type for the instruments. Used for the Equities segment.">
            <validValue name="ODD_LOT" description="Odd lot">1</validValue>
            <validValue name="ROUND_LOT" description="Round lot">2</validValue>
            <validValue name="BLOCK_LOT" description="Block lot">3</validValue>
        </enum>
// Generated code (Java)
public enum LotType
...
    /**
     * To be used to represent not present or null.
     */
    NULL_VAL((short)255);

I expect NULL_VAL to be ((short) 0).

I can't just specify 'NULL_VAL' as a <validValue> because it creates an invalid Java file (with 2 constants NULL_VAL with the same name).

By printing the intermediate representation for the enum, we can see that the enum behaves like the encoding is always required, and uses the default value for nullValue, that's 255 in this case.

  Token{signal=BEGIN_ENUM, name='LotType', referencedName='null', description='Describes the lot type for the instruments. Used for the Equities segment.', id=-1, version=0, deprecated=0, encodedLength=1, offset=198, componentTokenCount=5, encoding=Encoding{presence=REQUIRED, primitiveType=UINT8, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=255, constValue=null, characterEncoding='null', epoch='null', timeUnit=null, semanticType='Int'}}
  Token{signal=VALID_VALUE, name='ODD_LOT', referencedName='null', description='Odd lot', id=-1, version=0, deprecated=0, encodedLength=0, offset=0, componentTokenCount=1, encoding=Encoding{presence=REQUIRED, primitiveType=UINT8, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=null, constValue=1, characterEncoding='null', epoch='null', timeUnit=null, semanticType='null'}}
  Token{signal=VALID_VALUE, name='ROUND_LOT', referencedName='null', description='Round lot', id=-1, version=0, deprecated=0, encodedLength=0, offset=0, componentTokenCount=1, encoding=Encoding{presence=REQUIRED, primitiveType=UINT8, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=null, constValue=2, characterEncoding='null', epoch='null', timeUnit=null, semanticType='null'}}
  Token{signal=VALID_VALUE, name='BLOCK_LOT', referencedName='null', description='Block lot', id=-1, version=0, deprecated=0, encodedLength=0, offset=0, componentTokenCount=1, encoding=Encoding{presence=REQUIRED, primitiveType=UINT8, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=null, constValue=3, characterEncoding='null', epoch='null', timeUnit=null, semanticType='null'}}
  Token{signal=END_ENUM, name='LotType', referencedName='null', description='Describes the lot type for the instruments. Used for the Equities segment.', id=-1, version=0, deprecated=0, encodedLength=1, offset=198, componentTokenCount=5, encoding=Encoding{presence=REQUIRED, primitiveType=UINT8, byteOrder=LITTLE_ENDIAN, minValue=null, maxValue=null, nullValue=255, constValue=null, characterEncoding='null', epoch='null', timeUnit=null, semanticType='Int'}}
EEWatanabe commented 2 years ago

I even tried to add 'optional' and 'nullValue' attributes to the enum definition and the field definition, but the first one is forbidden by the XSD schema, and the second one will work only in SBE 2.0 (unsupported by this tool).

EEWatanabe commented 2 years ago

Thank you a lot. It will be very useful because it's easier and safer to clean up the buffer with zeros before encoding messages with lots of enums whose null values can be 0 now.
(I tried to write a PR myself, but I quickly discovered that it would be more difficult than I thought - by reading the code, I discovered that every little part was working was designed, but something has gone wrong in the integration of the modules, so I would have to make several modifications to make it work.)

michaelszymczak commented 2 years ago

I can see that this change is already merged and it's now part of the new release (1.25.2). However, I wonder how safe this change really is.

We have run some tests against this change and it seems that some of the venues rely on the old behaviour (before the fix). For instance, iLink3 schema that supports the Simple Binary Encoding Specification (SBE version 1.0 release candidate 4) relies on the fact that the null encoding is different depending on whether the encoding type is part of enum or not.

See https://www.cmegroup.com/confluence/display/EPICSANDBOX/iLink+3+-+Simple+Binary+Encoding#iLink3SimpleBinaryEncoding-Encodingfornullvalues

I used the schema from https://www.cmegroup.com/confluence/display/EPICSANDBOX/iLink+3+-+Simple+Binary+Encoding#iLink3SimpleBinaryEncoding-SchemaDistribution to generate Java encoders/decoders. This is the minimal schema snippet that reproduces the issue.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:messageSchema xmlns:ns2="http://www.fixprotocol.org/ns/simple/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" package="iLinkBinary" id="8" version="7" semanticVersion="FIX5.0"
                   description="20210106" byteOrder="littleEndian" xsi:schemaLocation="">
    <types>
        <!--- ... -->
        <composite name="messageHeader" description="Template ID and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <!--- ... -->
        <type name="charNULL" description="Char with null value" presence="optional" nullValue="0" primitiveType="char" semanticType="char"/>
        <enum name="CmtaGiveUpCD" encodingType="charNULL">
            <validValue name="GiveUp" description="Give Up">G</validValue>
            <validValue name="SGXoffset" description="SGX offset">S</validValue>
        </enum>
        <!--- ... -->
    </types>
    <!--- ... -->
    <ns2:message name="PartyDetailsDefinitionRequest518" id="518" description="PartyDetailsDefinitionRequest" blockLength="147" semanticType="CX">
        <!--- ... -->
        <field name="CmtaGiveupCD" id="9708" type="CmtaGiveUpCD" description="Indicates if the order is a give-up or SGX offset. Reject if greater than max length or not containing valid value. "
               offset="124" semanticType="char"/>
        <!--- ... -->
    </ns2:message>
    <ns2:message name="QuoteCancelAck563" id="563" description="QuoteCancelAck" blockLength="351" semanticType="b" sinceVersion="5">
        <!--- ... -->
        <field name="UnsolicitedCancelType" id="9775" type="charNULL" description="Type of quote cancel generated by CME -- returned only for unsolicited quote cancels" offset="338"
               semanticType="char"/>
        <!--- ... -->
    </ns2:message>
    <!--- ... -->
</ns2:messageSchema>

When you use the snippet above to generate the files, there is a difference in how the CmtaGiveUpCD NULL_VAL enum is encoded. Before it was 0x00, now it 0x48.

Before (before version uk.co.real-logic:sbe-tool:1.25.2):

public enum CmtaGiveUpCD
{
    // ...
    NULL_VAL((byte)0); // <==== DIFFERENCE
}

public final class QuoteCancelAck563Encoder implements MessageEncoderFlyweight
{
    // ...
    public static byte unsolicitedCancelTypeNullValue()
    {
        return (byte)48;
    }
}

After (after version uk.co.real-logic:sbe-tool:1.25.2):

public enum CmtaGiveUpCD
{
    // ...
    NULL_VAL((byte)48); // <==== DIFFERENCE }
}

public final class QuoteCancelAck563Encoder implements MessageEncoderFlyweight
{ 
    // ...
    public static byte unsolicitedCancelTypeNullValue()
    {
        return (byte)48;
    }
}

Do you think that this change conforms to the SBE version 1.0 release candidate 4 Specification? In particular, my concern is in regard to the interpretation of the field

<type name="charNULL" description="Char with null value" presence="optional" nullValue="0" primitiveType="char" semanticType="char"/>

that is used both as enums and as standalone fields, that it now always produces 0x48 as a null value in both cases.

Thank you.

mjpt777 commented 2 years ago

The Real Logic implementation of SBE is targeted at the SBE 1.0 final draft.

michaelszymczak commented 2 years ago

Fair enough. If I read it correctly then, this change is compatible with SBE 1.0 final draft (I think it is not impossible to be fine with RC4, but this point is out of this project's scope). Thanks you.

mjpt777 commented 11 months ago

See this PR https://github.com/real-logic/simple-binary-encoding/issues/958 for further discussion.