kuba2k2 / datastruct

Combination of struct and dataclasses for easy parsing of binary formats
MIT License
6 stars 1 forks source link

Alternative way of variable length bytes field. #3

Open robots opened 1 month ago

robots commented 1 month ago

Hi,

This is not issue, this is more of "documenting" of features for future users.

I needed something that is like "vartext" but with bytes. The example shown in README.md didn't work well.

So i implemented varbytes field.

def varbytes(
    length: Value[int],
    *,
    default: bytes = ...
):
    class Bytes(Adapter):
        def encode(self, value: bytes, ctx: Context) -> bytes:
            return value

        def decode(self, value: bytes, ctx: Context) -> bytes:
            return value

    return adapter(Bytes())(
        field(
            lambda ctx: (
                len(ctx.P.self) if ctx.G.packing else evaluate(ctx, length)
            ),
            default=default,
        )
    )

This is strongly inspired by vartext, but simplified as it only supports variable length. (Yes encode and decode could be lambdas, meh)

@dataclass
@datastruct(endianness=Endianness.LITTLE, padding_pattern=b"\x00")
class Packet1(DataStruct):
    list: int = field('B')
    num: int = field('H')
    count: int = field('H')
    data: bytes = varbytes(lambda ctx: ctx.G.root.length-5)

@dataclass
@datastruct(endianness=Endianness.LITTLE, padding_pattern=b"\x00")
class Packet(DataStruct):
    length: int = built("B", lambda ctx: ctx.body.sizeof())
    body: Any = switch(lambda ctx: ctx.typ)(
            PACKET1 = (Packet1, subfield()),
        )

When packing Packet.length is automatically calculated from length of data in Packet1 (or any other body packet) When unpacking Packet1.data length is calculated from Packet.length - 5 (which is length of B+H+H))

kuba2k2 commented 1 month ago

Are you sure that you couldn't just use the usual field() with a lambda here?

After studying the code a bit, you're right that some tinkering is needed here. Using field() wouldn't allow for packing different-length values in data, because it would expect length bytes at all times.

That being said, the varbytes() adapter is actually not doing anything - it simply passes along whatever it gets in encode() and decode().

So if the adapter can be removed, the varbytes() just becomes:

def varbytes(
    length: Value[int],
    *,
    default: bytes = ...
):
    return field(
        lambda ctx: (
            len(ctx.P.self) if ctx.G.packing else evaluate(ctx, length)
        ),
        default=default,
    )

...which would be a nice addition to the library's helpers.py :slightly_smiling_face: right next to varlist(), which works in a similar way. If you want, you could submit a PR with this feature.

The bytes/str fields are quite a mess now, I admit. A short summary:

The text() field will require refactoring at some point.