scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
405 stars 68 forks source link

Support specifying MSB0 or LSB0 when reading ints from a BitStream #226

Closed medley56 closed 1 year ago

medley56 commented 2 years ago

I'm decoding packet data which contains (in a particular specified order) arbitrary bit-length integers, some of which are LSB0 and some of which are MSB0. It would be helpful to have the ability to specify the bit ordering when parsing an individual integer out of a stream, such that the bits are read in the order they come in, but interpreted with LSB0 or MSB0 as specified in read. This could mean adding a new format string, such as uintlsb0:n. As an example behavior:

import bitstring

s = bitstring.ConstBitStream('0b00110001')
# 3, 1 as 4 bit ints with LSB last
print(s.readlist(['uint:4', 'uint:4']))
# >>> [3, 1]

s.pos = 0
# 24, 16 as 4 bit ints with LSB first
print(s.readlist(['uintlsb0:4', 'uintlsb0:4']))
# >>> [24, 16]
scott-griffiths commented 1 year ago

Hi. Sorry it's taken so long to reply - I must have missed it somehow.

Firstly, LSB0/MSB0 are just about which bit is labelled at bit zero - they don't affect the value in the way you describe. If you use version 4 of bitstring you can set bitstring.lsb0 = True which will make your first readlist return [1, 3] instead of [3, 1].

It's pretty unusual to have integers interpreted in the other direction (different bit-wise endianness), and it's probably easiest to reverse the bitstring and then interpret. (Incidentally if there's a standard or other application that's using them like this then I'd be interested to see it).

So you could do

>>> s = s[::-1]
>>> s.readlist('2*u4')
[12, 8] 

which gets you the effect. If there are combinations of integers with different bit-wise endianness then that's quite strange!

kehrazy commented 1 year ago

hey there, scott. great work on the library, loving it so far. i've encountered the same problem as @medley56, so i thought i'd share some thoughts on this.

which gets you the effect. If there are combinations of integers with different bit-wise endianness then that's quite strange!

the core idea is to have a list of fmt strings to parse a lot of inputs. it does get us the effect. unfortunately, that would be really bad in terms of code:

if should_reverse: ...s = s[::-1]

why would we parse each read operation like that, as opposed to:

s.read('2*uintlsb0:4')?

all that needs to be done is to add some fmt qualifier that tells the parser to reverse the string. would be really awesome <3

looking forward to your reply.

scott-griffiths commented 1 year ago

Hi @kehrazy ,

I think I really need a concrete example to look at here. The LSB0/MSB0 doesn't do anything to reverse the bitstring before interpreting it - it's just the direction that the bits are numbered (the rightmost bit is always the least significant).

Now it would be possible to have an ability to specify LSB0/MSB0 in a format string, which would override whatever the default has been set to be. That would make more sense for unpack as it might get confusing for reads if they are combined. I'm just thinking aloud here:

s = BitArray(100)
s.read('msb0::2*uint:10')  # Reads left-most 20 bits. s.pos now == 20
s.read('lsb0::uint:10') 

So does that second read re-read the second uint read above? If so afterwards s.pos == 90 (from a LSB0 perspective). Or does it read from bit 20 from a LSB0 perspective, which doesn't make any sense.

I think the point that bothers me is that the position in the bitstring isn't easy to understand when you switch between LSB0 and MSB0. This is something that I need to fix anyway but at least a change to a module variable would let me reset or change current pos values.

The other (unrelated?) option is allowing an option to reverse the bitstring before applying the interpretation. This wouldn't be too hard, but I'm not convinced it's useful - I've yet to see a real-world example.

s.read('2*u4r`) # Read two 4-bit bitstrings, reverse them and interpret as unsigned integers.

That might be useful to have (though wouldn't work for exp-Golomb codes).

medley56 commented 1 year ago

Hi Scott,

I think your last example is exactly what I need. When you say

If there are combinations of integers with different bit-wise endianness then that's quite strange!

That's totally fair but it is what I'm working with. Basically I'm using your library to parse arbitrarily defined binary packets, concatenated into a binary file, which I read into a ConstBitStream and parse incrementally, sometimes inferring future data types based on previous data, so reversing a massive binary string array is not really ideal. Within a single packet we have binary blobs, integers, floats, strings, all of totally arbitrary length (depending on the configured packet structure and the information inferred from previous parsing). This is for spacecraft telemetry, so we have flight software that is writing data to packets from various highly specialized sources across the spacecraft systems and yes, sometimes we get a mix of integers written one way and then the other (flipped).

scott-griffiths commented 1 year ago

Migrating the idea to a new issue #265.