TLV compatibility with SVS Specification

phylib commented 3 years ago

I just tried to add the TLV test from the C++ implementation and found that the TLV format of the Python implementation is not yet perfectly correct.

The TLV \xCA\x03\x6F\x6E\x65\xCB\x01\x01\xCA\x03\x74\x77\x6F\xCB\x01\x02 contains the state vector entries ['one': 1, 'two': 2]. When wrapping it into the TLV Type 201, it should be possible to parse. However, parsing is not possible.

The hex values C9, CA and CB represent the TLV types 201..VersionVector, 202..VersionVectorKey, and 203...VersionVectorValue

When creating the corresponding state vector in Python, I get the following TLV: \xc9\x08\xca\x03\x6F\x6E\x65\xcb\x01\x01 \xc9\x08\xca\x03\x74\x77\x6F\xcb\x01\x02

As far as I know, there should be only one 201 component for a state vector. The Python implementation, however, has one 201 plus one additional 202 component for every state vector entry.

Additionally, TLV parsing appears to have an additional minor issue. As indicated here, the StateVector class assumes to receive a TLV generated by StateVector-->to_component. The to_component method without parameter, however, appears to add the generic TLV type TYPE_GENERIC = 0x08. Instead, the TLV Type 201 StateVector should be prepended.

justincpresley commented 3 years ago

Yes, that is a good concern.

Previous Information

Python-ndn has different mechanics compared to C++ for TLV. Therefore, I had to use this structure which can be viewed in code here:

StateVector = *StateVectorComponent
StateVectorComponent = VERSION-VECTOR-COMPONENT-TYPE TLV-LENGTH NodeID SeqNo

NodeID = VERSION-VECTOR-KEY-TYPE TLV-LENGTH *OCTET
SeqNo = VERSION-VECTOR-VALUE-TYPE TLV-LENGTH NonNegativeInteger
VERSION-VECTOR-COMPONENT-TYPE = 201
VERSION-VECTOR-KEY-TYPE = 202
VERSION-VECTOR-VALUE-TYPE = 203

Adding to this, I had to also wrap a StateVector TLV into a component and used the generic type for that.

Why is it this way?

I did ask the creator of python-ndn about the TLV as I had the same concerns.

This is what I know since then... python-ndn Model needs you to encapsulate StateVectorComponent with a TLV number, instead of StateVector

When defining a RepeatingField TLV (for the StateVector), it requires a TLV number, and this is added to every component. On top of this, to be a component, I had to wrap the StateVector TLV within some type (I used generic for this).

Moving Forward

I am not sure how to actually fix this as the C++ version combines these two steps into one or this is how I think about it.

zjkmxy commented 3 years ago

Hello. I suggest this pattern because

The original design is a little counterintuitive: StateVectorComponent itself is a unit of data so it deserves a TLV.
The original format does not follow the principle that TLV fields should follow a fixed order

Phillip told me that the format should be consistent for now. If this must be the case, we can decode StateVector manually:

import enum
import typing
import struct
import asyncio as aio
import ndn.encoding as enc

class StateVectorModelTypes(enum.Enum):
    VECTOR = 201
    KEY = 202
    VALUE = 203

class StateVectorComponentModel(enc.TlvModel):
    node_id = enc.BytesField(StateVectorModelTypes.KEY.value)
    seq_num = enc.UintField(StateVectorModelTypes.VALUE.value)

class StateVectorModel:
    value: typing.List[StateVectorComponentModel]

    def encode(self) -> bytearray:
        component_wires = [v.encode() for v in self.value]
        length = sum(len(w) for w in component_wires)
        buf_len = length + enc.get_tl_num_size(length) + enc.get_tl_num_size(StateVectorModelTypes.VECTOR.value)
        ret = bytearray(buf_len)
        pos = enc.write_tl_num(StateVectorModelTypes.VECTOR.value, ret)
        pos += enc.write_tl_num(length, ret, pos)
        for w in component_wires:
            ret[pos:pos + len(w)] = w
            pos += len(w)
        return ret

    @staticmethod
    def parse(buf):
        # Verify the Type
        typ, pos = enc.parse_tl_num(buf)
        if typ != StateVectorModelTypes.VECTOR.value:
            return None
        # Check the length
        length, l = enc.parse_tl_num(buf, pos)
        pos += l
        if pos + length != len(buf):
            return None
        # Decode components
        ret = StateVectorModel()
        ret.value = []
        while pos < len(buf):
            # Node ID
            typ, l = enc.parse_tl_num(buf, pos)
            pos += l
            if typ != StateVectorModelTypes.KEY.value:
                return None
            length, l = enc.parse_tl_num(buf, pos)
            pos += l
            node_id = buf[pos:pos + length]
            pos += length
            # Value
            typ, l = enc.parse_tl_num(buf, pos)
            pos += l
            if typ != StateVectorModelTypes.VALUE.value:
                return None
            length, l = enc.parse_tl_num(buf, pos)
            pos += l
            if length == 1:
                value = struct.unpack_from('!B', buf, pos)[0]
            elif length == 2:
                value = struct.unpack_from('!H', buf, pos)[0]
            elif length == 4:
                value = struct.unpack_from('!I', buf, pos)[0]
            elif length == 8:
                value = struct.unpack_from('!Q', buf, pos)[0]
            else:
                return None
            pos += length
            # Append the component
            comp = StateVectorComponentModel()
            comp.node_id = node_id
            comp.seq_num = value
            ret.value.append(comp)
        return ret

async def main():
    vec = StateVectorModel()
    vec.value = [StateVectorComponentModel(), StateVectorComponentModel()]
    vec.value[0].node_id = b'one'
    vec.value[0].seq_num = 1
    vec.value[1].node_id = b'two'
    vec.value[1].seq_num = 2
    buf = bytes(vec.encode())
    print(buf)

    vec = StateVectorModel.parse(buf)
    print(vec.value[0].node_id)
    print(vec.value[0].seq_num)
    print(vec.value[1].node_id)
    print(vec.value[1].seq_num)

if __name__ == '__main__':
    aio.run(main())

justincpresley commented 3 years ago

Thank you, zjkmxy for your addition. Yes, avoiding the RepeatingModel and making our own StateVectorModel allowed us to follow the TLV that the specifications use.

Since this is the case, our model used is the one specified.

justincpresley / ndn-python-svs

TLV compatibility with SVS Specification #1