scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
404 stars 68 forks source link

Allow per-token bit reversals #265

Closed scott-griffiths closed 2 months ago

scott-griffiths commented 1 year ago

Migrating idea from issue #226.

It would be useful to some to be able to interpret bitstrings in the opposite direction. This would be equivalent to performing a bitwise reverse before interpreting the bits.

This wouldn't work on exponential Golomb codes if we are in MSB0 bit numbering, but strangely enough should work if we're in LSB0 bit numbering. Which I quite like the symmetry of.

So let's say we construct a bitstring like this:

s1 = BitArray(float=0.25, length=16)
s2 = BitArray(uint=54, length=9)
s1.reverse()
s2.reverse()
s = s1 + s2

So to read this back we do something like

f, u = s.readlist('f16r, u9r')

where the r means to reverse the bits before doing the interpretation. So following this notation, to create the original bitstring we could do:

s = Bits('float16r=0.25, u9r=54')

or maybe

s = Bits(floatr=0.25, length=16) + Bits(uintr=54, length=9)

Though I don't really like this last method. There would be a whole bunch of new keyword initialisers. It could be a flag:

s = Bits(float=0.25, length=16, reverse=True)

which is slightly nicer. So maybe we should allow

f, u = s.readlist('f16, u9', reverse=True)

but then what would this do?

f, u = s.readlist('f16, u9r', reverse=True)

(answer: raise an exception). Also the name reverse here is ambiguous and could confuse the user.

For interpretation do we allow properties like this:

x = s.intr

which would interpret s as an int but with the bits reversed. That would imply allowing s.intr = 14 as well, which implies s = Bits(int=14, length=100, reverse=True) should also be allowed.

Let's ignore creation and go back to just reading for a bit. Tokens like float32r look fine, but what about others. We could have bfloatler which looks hard to parse (for a human). I'd rather not reinvent regular expressions!

Maybe don't just append an r. An R might be clearer. Or how about a Я - that would be cool 😉 . Or perhaps _r separates it better.

kehrazy commented 1 year ago

Maybe don't just append an r. An R might be clearer. Or how about a Я - that would be cool 😉 . Or perhaps _r separates it better.

Wouldn't _r work, e.g.:

scott-griffiths commented 1 year ago

Yes, I agree that the _r looks the best.

I'll take a look at this after I get version 4.1 released. I'm still not sure it will make the cut - it's possible for example that it impacts the more usual use case and I don't want it to be a performance hit.

My rough plan I think:

kehrazy commented 1 year ago

...and I don't want it to be a performance hit.

Not quite sure how that would be a performance hit. Python's [::-1] is quite fast.

kehrazy commented 1 year ago

Also, since the initializer list is getting quite big - would you look into accepting/making a lexer for the tokens? Would be a hit on the runtime performance in the initializing of objects, though would make it much easier to add/parse the fmt string in question.

scott-griffiths commented 1 year ago

The performance problem isn't with the reverse itself (we have to do the [::-1] ourselves and it's a few orders of magnitude faster than the naive implementation). The possible performance hit is more with object creation and token parsing, but the keyword initialisers can be omitted from the first version. Plus I know how much test code will have to be written if I make them more general!

scott-griffiths commented 8 months ago

I'm in the process of a refactor that would make it much easier to add new types - probably including allowing the end user to add their own types. This feature should be much easier to implement at that point, but I need to get the rest of the work done first so I'm moving this to the 4.3 release.

scott-griffiths commented 2 months ago

This is one of the changes I will instead prioritise for the bitformat library. The reworking of the Dtype class will make the bit reversal a peer of endian byte reversals, so much easier to do.