scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
404 stars 68 forks source link

More convenience interpretations #237

Closed scott-griffiths closed 1 year ago

scott-griffiths commented 1 year ago

Might it make sense to add some extra useful types. I'm thinking of copying the primitive data types from Rust as a template:

i8 = int:8 i16 i32 i64 u8 = uint:8 u16 u32 u64 f32 = float:32 f64

So rather than


a = BitArray('float:64=0.2')
b, c = a.unpack('float:32, float:32')

we could say

a = BitArray('f64=0.2')
b, c = a.unpack('f32, f32')

or a = pack('uint:8, unit:32, float:32', x)

becomes a = pack('u8, u32, f32', x)

Also when interpreting we get the chance to raise an exception if it's not the correct length:

a = Bits('0x1234')
b = a.uint  <--  fine
b = a.u32  <-- error: bitstring is 16 not 32 bits long

Looking forward this would be in line with other float types:

f64 = float:64 f32 = float:32 f16 = float:16 (IEEE) bf16 = b-float gc152 = 8bit float nv143 = another 8 bit float

This might be too extreme, but the logical extension is to allow the new interpretations without a length:

a.i = a.int a.u = a.uint a.f = a.float

scott-griffiths commented 1 year ago

Another idea is to allow any length be specified this way, so you could have i15 or u71 etc. Probably not for the floats though! This would need some clever hooks to parse arbitrarily named class properties.

scott-griffiths commented 1 year ago

And while we're here would it also make sense to do:

a.h = a.hex
a.o = a.oct
a.b = a.bin (not a.bool or a.bytes)

which means that logically we'd have to allow

a = Bits('b=01, h=abcdef')
b = pack('u32, b4', 45, '0100')
scott-griffiths commented 1 year ago

And also allow

a.pp('h')
a.pp('b, o')

but still have

a.pp('bytes')
scott-griffiths commented 1 year ago

Having played around with my initial version of this a few observations:

bf16 doesn't make sense as bfloats are always 16 bits long. Currently I only call them bfloat but bf is a reasonable short version.

The short versions are all byte-wise big endian. I think that's fine. Using byte-wise little endian would get very confusing for integer types that aren't whole byte (and are currently bit-wise big endian).

I'm finding that I'm typing things like a.float32 = 0.1 and being surprised it doesn't work. Suggest we add:

a.float32
a.int64
a.uint12
a.floatle64
etc.

Needs some general code for properties of the form a.[name][length] as there are a lot of possibilities.