scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
404 stars 68 forks source link

Add a BitsFormat type. #281

Closed scott-griffiths closed 8 months ago

scott-griffiths commented 1 year ago

We often have to parse a string to work out the format from it. This can be expensive, but can also be cached. Another method would be to have a proper object to represent the format, which could maybe later become part of the public interface.

So instead of either a str or list(str) we also accept a single BitFormat for Array initialisation, read, readlist, unpack etc. The BitFormat can contain the partial function it needs to read and interpret bits of the correct length, similar to what we use internally in Arrays.

>>> bf = BitFormat('u8')
>>> bf.length
8
>>> bf._read_method
partial(Bits._read_uint, length=8)

So we can then do something like an unpack with [self.read_method() for read_method in bf.read_methods]

It would be good to allow the user to create and use the BitFormat:

dims = BitFormat('2*u12')
x, y = s.readlist(dims)

So we'd need to give a human readable representation of the BitFormat:

>>> dims
BitFormat([('uint', 12), ('uint', 12)])

Which leads us to

BitFormat(fmt: str | Sequence[Sequence[str, int]])

When creating with a sequence the strings should be simple dictionary keys with no further processing, for example uint but not u or uint:.

Having partial methods to read the format follows on from how we're doing it in the Array class. It's probably a good place to trial it.

Possibly it's better to only allow a single item in a BitFormat. It would certainly cache better.

BitFormat(interpretation: str, length: Optional[int])

if length is None then we parse the interpretation to find it, otherwise the interpretation must be in our dict.

dim1 = BitFormat('u12')
dim2 = BitFormat('uint', 12)
x, y = s.readlist([dim1, dim2])

Does the BitFormat actually need to contain anything other than the partial read method? Well it's nice to have a str and int for readability, plus we could also have a write method:

a = pack(dim1, 100)
b = Bits.from_format(dim1, 100)

or even make it callable???

b = dim1(100) # -> Bits type?
scott-griffiths commented 1 year ago

Should this be called DType or possibly dtype to match the prevailing style? Probably.

scott-griffiths commented 8 months ago

The Dtype work is pretty much done. The feature to make it callable is in #304.