scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
404 stars 68 forks source link

Sort as in SQL bitstrings? #280

Closed ppKrauss closed 1 year ago

ppKrauss commented 1 year ago

I'm looking for a sort method for BitArray, does it exist?

>>> u=(BitArray('0b101'),BitArray('0b1'),BitArray('0b00'))
>>> u.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'sort'

Reference note

It is not an ISO standard, but PostgreSQL databases has its BitArray analog data type and sorts a list x with ORDER BY x clause: it is the lexicographic order. For example ordering all BitArrays, from 1 to 4 bits:

 0
 00
 000
 001
 01
 010
 011
 1
 10
 100
 101
 11
 110
 111

PS: so, in the Python example, we expect as "standard" result (BitArray('0b1'),BitArray('0b00'),BitArray('0b101')).

scott-griffiths commented 1 year ago

Hi. That's an interesting question.

Firstly the error you get is because you're using a tuple instead of a list. tuples don't have a sort operation as they can't be changed. Switching to a list:

>>> u = [BitArray('0b101'), BitArray('0b1'), BitArray('0b00')]
>>> u.sort()
TypeError: '<' not supported between instances of 'BitArray' and 'BitArray'

The error now is because there is no ordering between bitstrings. You can order interpretations of them though:

>>> y = [x.bin for x in u]
>>> y
['101', '1', '00']
>>> y.sort()
>>> y
['00', '1', '101']

but that's just sorting ordinary Python strings. But a quick check shows gives exactly the order you want (I have to admit I'm surprised at that).

So in summary, my method if you want to start and finish with an array of bitstrings would be:

>>> u = [BitArray('0b101'), BitArray('0b1'), BitArray('0b00')]
>>> v = [x.bin for x in u]
>>> v.sort()
>>> v = [BitArray(bin=x) for x in v]
>>> v
[BitArray('0b00'), BitArray('0b1'), BitArray('0b101')]

Which I think is correct (it isn't the same as your 'expected result' but does match the order given).