scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
412 stars 68 forks source link

Matching C order for bit packed structs. #341

Closed taylorh140 closed 3 months ago

taylorh140 commented 4 months ago
import bitstring
from bitstring import Bits,pack
from collections import namedtuple

bitstring.options.lsb0 = True

types = "=B,bool,pad7"

data = pack(types,0xAA,True) #pack the bytes 

for i in data.bytes: # print out each byte
    print(hex(i))

python output:

0x1 0xaa
#include <stdio.h>

typedef struct {
    unsigned char Q:8;
    _Bool A:1;
    int    :7;
} stuff_t;

int main() {
    // Write C code here
    stuff_t B = {.Q=0xAA,.A=1}; //pack the bytes

    for(int i=0;i<2;i++){ // print out each byte
        unsigned char * D = (unsigned char*)(void*)&B;
        printf("0x%02x",*(D+i));
    }
    return 0;
}

c results:

0xaa 0x01

The problem here is that they are reversed from what i would expected.

taylorh140 commented 4 months ago

And it seems like here it would be as easy as reversing the bits. But it isn't. it messes with the endianness of stored data.

taylorh140 commented 4 months ago

A uneasy hack that seems to makes these match is :

bitstring.options.lsb0 = True
reverse endianess on all Dtypes.

unpacking:
 - reverse incoming data. data[::-1]
 - unpack

 packing:
 - pack input data
 - reverse byte order before sending. data[::-1]

I'm hoping there is a better way.

This doesn't really work either cause it reverses strings.

scott-griffiths commented 4 months ago

Hi.

I think that pack is working as intended here. The LSB0 mode means that reads, slices etc. are happening from right to left, so for example the reverse operation

>>> data.unpack(types)
[170, True]

is what you'd expect. When the data is unpacked here it does the right-most byte first, then the next bit. This is correct and I think it's reasonable that pack should behave in the same way.

I think there is some confusion because for this particular case reversing the byte order exactly reverses the effect of LSB0 (as both things are contained in one byte). Reversing byte order won't work in general.

Your C code is implicitly MSB0, so possibly you don't need LSB0 mode at all?

taylorh140 commented 3 months ago

Your C code is implicitly MSB0

Hmm, I'm wondering if that's correct because when I do the following:

#include <stdio.h>
#include <stdint.h>

typedef struct {
    _Bool A:1;
    _Bool B:1;
    _Bool C:1;
    _Bool D:1;
    int :4;
} myTest_t;

int main()
{
    myTest_t DATA = {.A=1,.B=1,.C=1,.D=1};
    uint8_t Printer = *(uint8_t*)(void*)&DATA;
    printf("0x%02x\n",Printer);

    return 0;
}

you can check it for yourself here: https://onlinegdb.com/G4g3AkgBc

MSb0 i would expect: 0xF0 A B C D pad

LSb0 i would expect: 0x0F `pad D C B A

It might be that I'm misinterperting something, but this is how I'm familiar with c compilers behaving.

scott-griffiths commented 3 months ago

OK, I think you're right and hopefully I get it now. The C compiler is packing the bitfields in a LSB0 manner (this is I think compiler dependent), but only the bitfields. The larger byte structures are still being packed left-to-right.

So in the original struct

typedef struct {
    unsigned char Q:8;
    _Bool A:1;
    int    :7;
} stuff_t;

the char comes first, then the bitfields are packed LSB0 so it's the int followed by the _Bool. So reading left-to-right it's the first member of the struct, then the third, then the second. That's how you get the final 0xaa 0x01 output.

In bitstring with LSB0 the equivalent does the 3rd then 2nd then 1st field (reading left-to-right). For MSB0 it does it the opposite way round. So neither will match the C code.

I don't think there's anything that bitstring can do here - the packing rules of a particular C compiler are a bit out of scope. I think I would keep it in MSB0, and then try to reverse the C-packed bitfields for whichever rules the compiler has. In general a mix of bit and byte endianness!

taylorh140 commented 3 months ago

Its somewhat interesting that this hasn't shown up more often. given that pythons' own ctypes assumes the same order for little endian chips.

i actually put together something to pack and unpack these using ctypes:

ExtStruct.txt

its actually a python file. but it gives an example of how the packing is assumed in c python. it matches gcc. armcc. and ctypes. for x86