scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
404 stars 68 forks source link

Non-empty bitstrings considered False #270

Closed scott-griffiths closed 1 year ago

scott-griffiths commented 1 year ago

For example

>>> a = BitArray(8)
>>> bool(a)
False

This is very surprising and makes constructions like if a: unpredictable.

The docs for __bool__() say that "When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero."

For comparison:

>>> a = bytearray(8)
>>> bool(a)
True
scott-griffiths commented 1 year ago

I'm going to treat this as a 15 year old bug rather than a change to a 15 year old feature, even though I appear to have deliberately designed it this way. In my defence, this is a pre-Python 3 design choice and things were different back then...

fenkes-ibm commented 2 months ago

Before I dump my wall of text on you, I'd like to thank you for creating and maintaining the bitstring module! :heart: I use it at work daily, and it's a cornerstone of all the low level hardware debugging tools I've been writing since 2015. I needed a pure python module to handle bit strings of arbitrary length, bitstring fit my use case like a glove, the documentation's fantastic and I've been happily using it and endorsing it to my coworkers at the earliest opportunity since then.

But all that said, even though this ship's probably long sailed, I'd like to ask you to reconsider this change.

While it is true that a non-empty container should be considered truthy in Python, it is also true that an integer with value zero is falsy. Bitstrings are somewhere in the middle between those two - acting as appendable and sliceable containers of bits on the one hand, but offering bitwise operations like integers on the other. I don't think the old interpretation is any more or less confusing than the new - it's a point of ambiguity between "it's a container!" and "it's an integer!" that your past self resolved in favor of integers.

I'd like to argue that most people using a bitstring in a bool context interpret it as an integer in that moment, aka "Is this bitstring zero if interpreted as an integer?", rather than as a container. An obvious example would be mask checks like if data & ~mask: but generally I can think of many use cases for the integer interpretation while I don't see a lot of use for "is this zero length" checks. I am guessing that you used the same intuition when you designed it this way 15 years ago, so while I'm aware that I'm only speaking from my own experience, at least it seems to agree with your past self's intuition.

This change silently causes subtle breakage for a lot of programs that intuitively used the integer interpretation. Fixing affected programs is a difficult task - points of use are hard to find without adding warnings/exceptions to the __bool__() method. And in addition, the change required for all programs using the integer interpretation would be to switch to .any(True) which results in cumbersome code: if (data & mask).any(True):. (The uint or int properties would be only slightly less cumbersome and cause a performance hit since they cannot exit early on the first 1 bit.)

So with all that said, I wouldn't call this a bug but rather an ambiguity due to the dual nature of bitstrings, and I think the old integer interpretation would be of more general use than the new container interpretation.

Sorry about the long winded explanation - I hope it brings my point across. Of course I'm always open to argument, and I'll gladly accept whatever you make of this and update my code if I have to (or stay on 3.1.9 forever), but wanted to take a swing at swaying you first :)

scott-griffiths commented 2 months ago

Hi. Thanks for the kind words, and glad you find bitstring useful.

I'm afraid I'm not going to switch this behaviour back again though. I'd say that bitstrings are very much a container, and the integer part is just one of many interpretations, so I want to have the container behaviour where it makes sense. I've tried to not bias the meaning of the bits as different people have very different use cases. Also changing it back again would be a breaking change for anyone using 4.x which would be doubly hard to justify.

The mask example is interesting, and I don't have any better solutions than actually treating it as an integer with (data & mask).uint (I suspect any will only be faster if things get quite big).

A couple of options for fixing current code. Firstly, the one you suggested - just add an exception to the __bool__ method, see what breaks and fix it. The other is just to monkey-patch a new version of __bool__ to return bool(self.uint). Currently it's just

    def __bool__(self) -> bool:
        """Return False if bitstring is empty, otherwise return True."""
        return len(self) != 0

Monkey patches are rather hacky though, so I'd favour raising the exception and fixing the code.

I'm currently writing what might be a successor to bitstring (though probably it will never be finished or might never get any traction) called bitformat. This has allowed me to revisit a lot of early design choices, and I'm still going to keep the new meaning of bool, as I really don't like to give my own interpretation to the bits. One thing that always bothered me in bitstring was when creating a new bitstring with just a length it was created with all zeros. I just dislike the asymmetry. In bitformat you can use either Bits.zeros(n) or Bits.ones(n) and thus symmetry is restored!

fenkes-ibm commented 2 months ago

Thanks for your response - I'll add a warning to the bool method and see what breaks then :)

My strings do get kinda big sometimes (millions of bits) so I guess I'll use any at least for the ones I know might be large and uint for the smallish ones.

Keep up the good work, and thanks again!