Alternative way of variable length bytes field.

kuba2k2 / datastruct

Combination of struct and dataclasses for easy parsing of binary formats

MIT License

6 stars 1 forks source link

def varbytes( length: Value[int], *, default: bytes = ... ): class Bytes(Adapter): def encode(self, value: bytes, ctx: Context) -> bytes: return value def decode(self, value: bytes, ctx: Context) -> bytes: return value return adapter(Bytes())( field( lambda ctx: ( len(ctx.P.self) if ctx.G.packing else evaluate(ctx, length) ), default=default, ) )

@dataclass @datastruct(endianness=Endianness.LITTLE, padding_pattern=b"\x00") class Packet1(DataStruct): list: int = field('B') num: int = field('H') count: int = field('H') data: bytes = varbytes(lambda ctx: ctx.G.root.length-5) @dataclass @datastruct(endianness=Endianness.LITTLE, padding_pattern=b"\x00") class Packet(DataStruct): length: int = built("B", lambda ctx: ctx.body.sizeof()) body: Any = switch(lambda ctx: ctx.typ)( PACKET1 = (Packet1, subfield()), )

~~Are you sure that you couldn't just use the usual field() with a lambda here?~~

After studying the code a bit, you're right that some tinkering is needed here. Using field() wouldn't allow for packing different-length values in data, because it would expect length bytes at all times.

That being said, the varbytes() adapter is actually not doing anything - it simply passes along whatever it gets in encode() and decode().

So if the adapter can be removed, the varbytes() just becomes:

def varbytes(
    length: Value[int],
    *,
    default: bytes = ...
):
    return field(
        lambda ctx: (
            len(ctx.P.self) if ctx.G.packing else evaluate(ctx, length)
        ),
        default=default,
    )

...which would be a nice addition to the library's helpers.py :slightly_smiling_face: right next to varlist(), which works in a similar way. If you want, you could submit a PR with this feature.

The bytes/str fields are quite a mess now, I admit. A short summary:

bytestr() - used for strings, as raw bytes, with fixed storage length and variable text length (stripped and padded with padding_pattern)
text() - used for strings as UTF-8 text, with fixed storage length and variable text length (stripped and padded with padding_pattern)
vartext() - same as text(), but with variable storage length (something like your varbytes(), but as str)
and finally, varbytes() - used for raw bytes strings, with variable storage length and variable text length

The text() field will require refactoring at some point.

kuba2k2 / datastruct

Alternative way of variable length bytes field. #3