Allow per-token bit reversals

scott-griffiths commented 1 year ago

Migrating idea from issue #226.

It would be useful to some to be able to interpret bitstrings in the opposite direction. This would be equivalent to performing a bitwise reverse before interpreting the bits.

This wouldn't work on exponential Golomb codes if we are in MSB0 bit numbering, but strangely enough should work if we're in LSB0 bit numbering. Which I quite like the symmetry of.

So let's say we construct a bitstring like this:

s1 = BitArray(float=0.25, length=16)
s2 = BitArray(uint=54, length=9)
s1.reverse()
s2.reverse()
s = s1 + s2

So to read this back we do something like

f, u = s.readlist('f16r, u9r')

where the r means to reverse the bits before doing the interpretation. So following this notation, to create the original bitstring we could do:

s = Bits('float16r=0.25, u9r=54')

or maybe

s = Bits(floatr=0.25, length=16) + Bits(uintr=54, length=9)

Though I don't really like this last method. There would be a whole bunch of new keyword initialisers. It could be a flag:

s = Bits(float=0.25, length=16, reverse=True)

which is slightly nicer. So maybe we should allow

f, u = s.readlist('f16, u9', reverse=True)

but then what would this do?

f, u = s.readlist('f16, u9r', reverse=True)

(answer: raise an exception). Also the name reverse here is ambiguous and could confuse the user.

For interpretation do we allow properties like this:

x = s.intr

which would interpret s as an int but with the bits reversed. That would imply allowing s.intr = 14 as well, which implies s = Bits(int=14, length=100, reverse=True) should also be allowed.

Let's ignore creation and go back to just reading for a bit. Tokens like float32r look fine, but what about others. We could have bfloatler which looks hard to parse (for a human). I'd rather not reinvent regular expressions!

Maybe don't just append an r. An R might be clearer. Or how about a Я - that would be cool 😉 . Or perhaps _r separates it better.

kehrazy commented 1 year ago

Maybe don't just append an r. An R might be clearer. Or how about a Я - that would be cool 😉 . Or perhaps _r separates it better.

Wouldn't _r work, e.g.:

float32_r
u8_r

It is PEP8 compliant and looks clear.

scott-griffiths commented 1 year ago

Yes, I agree that the _r looks the best.

I'll take a look at this after I get version 4.1 released. I'm still not sure it will make the cut - it's possible for example that it impacts the more usual use case and I don't want it to be a performance hit.

My rough plan I think:

For string tokens, if they end in _r then reverse the bits as soon as they are read. E.g. s.read('uint:14_r')
If a string token is being used to create a bitstring then reverse the bits just before they are written. E.g. s += 'float16_r=5.2'
Interpretations of entire bitstring can be done in reverse fairly easily as we already parse arbitrary properties - just reverse the bits of the entire bitstring before the rest of the interpretation, e.g x = s.uint_r
Don't add new initialisers. Far too many and it's hard and possibly slow to make them programmatic. And I really don't want to write all those unit tests. So don't allow s = Bits(ue_r=501) for example.
Unsure about a reverse option in __init__. It's probably fine.
But don't add a reverse option in the read methods. Too confusing if nothing else.

kehrazy commented 1 year ago

...and I don't want it to be a performance hit.

Not quite sure how that would be a performance hit. Python's [::-1] is quite fast.

kehrazy commented 1 year ago

Also, since the initializer list is getting quite big - would you look into accepting/making a lexer for the tokens? Would be a hit on the runtime performance in the initializing of objects, though would make it much easier to add/parse the fmt string in question.

scott-griffiths commented 1 year ago

The performance problem isn't with the reverse itself (we have to do the [::-1] ourselves and it's a few orders of magnitude faster than the naive implementation). The possible performance hit is more with object creation and token parsing, but the keyword initialisers can be omitted from the first version. Plus I know how much test code will have to be written if I make them more general!

scott-griffiths commented 8 months ago

I'm in the process of a refactor that would make it much easier to add new types - probably including allowing the end user to add their own types. This feature should be much easier to implement at that point, but I need to get the rest of the work done first so I'm moving this to the 4.3 release.

scott-griffiths commented 2 months ago

This is one of the changes I will instead prioritise for the bitformat library. The reworking of the Dtype class will make the bit reversal a peer of endian byte reversals, so much easier to do.

scott-griffiths / bitstring

Allow per-token bit reversals #265

u8_r