hit9 / bitproto

The bit level data interchange format for serializing data structures (long term maintenance).
https://bitproto.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
127 stars 16 forks source link

Feature: add a "packed" or `-p` flag #38

Closed g-berthiaume closed 3 years ago

g-berthiaume commented 3 years ago

Problem

Currently, bitproto's serialized buffer is not packed.

Here's an example: Here's my simple schema:

proto mytest

message Data {
    uint20 preamble = 1
    uint15 start = 2
    uint64 data = 3
    uint15 crc = 4
}

We compile using:

$ bitproto c .\myschema.bitproto -O

Using the newly generated file, we can do the following tests:

uint8_t buffer[BYTES_LENGTH_DATA] = {0};

struct Data data = {
    .preamble = 0x123,
    .start = 32767, //< 2^15 -1
    .data = 0,
    .crc = 0,
};

EncodeData(&data, buffer);
printf("0x%02X\n", buffer[0]);
printf("0x%02X\n", buffer[1]);
printf("0x%02X\n", buffer[2]);
printf("0x%02X\n", buffer[3]);

When executing this test code, we print

$ make && ./test
0x23
0x01
0xF0

Therefore, this structure is padded. If the structure would be packed (no padding), I would expect:

0x48
0xFF
0xFF

Here's a visualization of the packed structure. image

Solution

Adding a -p packed flag.

Why is this valuable

This would enable bitproto's users to have better control over how their data is serialized.

g-berthiaume commented 3 years ago

As an addendum: Thanks for bitproto. It's an awesome tool. :)

hit9 commented 3 years ago

Hi, @g-berthiaume

I made a similar picture to explain it, a minor difference from your picture is the bits layout, low bits on the left and high bits on the right, since bitproto uses little-endian.

Number 0x123 in binary format is 100100011.

image

The type of preamble is uint20, which occupies 20bits after encoding.

But the value 0x123 owns 9bits, so you may think there are paddings of 11bits.

If we set preamble to a larger value, say 2^20-1 , the 'paddings' goes away.

If we set preamble to a much more larger value, say 2^32, only the lower 20bits will be taken.

The encoder end must respect to the data type predefined in bitproto file, so that the decoding end will be able to parse the encoded data.

If we encode data without these 'paddings', the situation will be, there are 9bits from from preamble, and then 1111.... from field start, arranged tightly without any unused bits. The following picture describes what you expect.

image

But, How to decode this buffer ?

Only the type definitions in bitproto file tells the decoder end that, where the field boundaries are, how the data fields arrange, so that the decoder will read 20 bits at first for preamble's value, and then read 15bits for start's value.

By putting start's bits ahead, it uses the wrong buffer, and decoder won't known that. bitproto doesn't store reflection data during encoding either. What the decoder will see, is all the data buffer, without any additional typing information, without any data arrangement information, it's designed to parse data according to the bitproto file.

Hope this can help you.

g-berthiaume commented 3 years ago

Thanks for your answer. I revised my question, and I realized that I was mistaken.

Thanks for your time and your project.