scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
401 stars 67 forks source link

Bug initializing an Array from bytes? #336

Closed jbhannon1 closed 2 months ago

jbhannon1 commented 2 months ago

If I convert an Array using .tobytes(), and then initialize a new Array from those bytes, the length can be different. why?

here is a minimal example

from bitstring import Array

d0 = [31]*10
a0 = Array('u5',d0)
b0 = a0.tobytes()
print(f"length 1: {len(a0):d}")

a1 = Array('u5',b0)
print(f"length 2: {len(a1):d}")

the output is

length 1: 10
length 2: 11
scott-griffiths commented 2 months ago

Hi there.

The issue is that the tobytes method may need to pad with between 0 and 7 bits in order to return a whole number of bytes.

Your a0 array has 10 unsigned integers, each of 5 bits, so that's 50 bits in total. When this is converted to bytes it needs to add 6 bits on the end to make it a whole number of bytes. When this is used to create a new array you have enough data for 11 u5 elements, plus one spare bit.

>>> a0
Array('uint5', [31, 31, 31, 31, 31, 31, 31, 31, 31, 31])
>>> a1
Array('uint5', [31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 0], trailing_bits=BitArray('0b0'))

So basically it's because you've explicitly asked for the data to be stored in a bytes object as an intermediary, and this can't be represented without some padding or losing some bits. The array has a data member that stores the bits, so that can be used instead, but it depends what you're trying to do.

a2 = Array('u5', a0.data)
jbhannon1 commented 2 months ago

Thanks very much for the quick response. I will keep track of the length and not rely on the length of the bytes object to give me the number of bits. Sent from my iPhoneOn May 29, 2024, at 10:41 AM, Scott Griffiths @.***> wrote: Hi there. The issue is that the tobytes method may need to pad with between 0 and 7 bits in order to return a whole number of bytes. Your a0 array has 10 unsigned integers, each of 5 bits, so that's 50 bits in total. When this is converted to bytes it needs to add 6 bits on the end to make it a whole number of bytes. When this is used to create a new array you have enough data for 11 u5 elements, plus one spare bit.

a0 Array('uint5', [31, 31, 31, 31, 31, 31, 31, 31, 31, 31]) a1 Array('uint5', [31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 0], trailing_bits=BitArray('0b0'))

So basically it's because you've explicitly asked for the data to be stored in a bytes object as an intermediary, and this can't be represented without some padding or losing some bits. The array has a data member that stores the bits, so that can be used instead, but it depends what you're trying to do. a2 = Array('u5', a0.data)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>