scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
401 stars 67 forks source link

Iterating using over Bits.find() causes an infinite loop #310

Closed ajahraus closed 5 months ago

ajahraus commented 6 months ago

When you write

bs = BitStream(filename=name)
while current_pos := bs.find(target_substring):
    # Read additional bits, so that bs.pos is now past current_pos
    bs.read(8)

current_pos does not take the value of the position of subsequent target_substrings. Instead, bs.pos seems to be reset in each iteration, which creates an infinite loop, where current_pos always remains at the location of the first match.

The desired behaviour can be hacked together via

bs = BitStream(filename=name)
current_pos = 0
while bs.find(target_substring, start=current_pos):
    # Read additional bits, so that bs.pos is now past current_pos
    bs.read(8)
    current_pos = bs.pos

Incidentally, this is different behaviour from

bs = BitStream(filename=name)
for start in s.findall(target_substring):
    # Read additional bits, so that bs.pos is now past current_pos
    bs.read(8)

because while inside the loop, I would like to be able to advance the file past the next occurrence of the target_substring.

scott-griffiths commented 5 months ago

Sorry, I somehow missed this when it was submitted.

I haven't tested your code, but I think the issue is that find doesn't return a int, it returns either a tuple with a single int, or an empty tuple.

The reason for this is illustrated in your code - if the target_substring is found at the start of bs then just returning 0 would be False in a boolean sense, and the while loop would exit. However the tuple (0, ) is True and so the loop would continue.

So I think you just need to use current_pos[0] to get the bit position in the first code fragment.

Hopefully that makes sense and works for you.

ajahraus commented 2 months ago

Boy, I really should have replied sooner than ... *checks notes* two and a half months.

That's not really my issue, the tuple value returned from find works great. My code example was a typo, and I think I've fixed it below, but the main issue is that current_pos never advances, it always stays at the first match.

bs = BitStream(filename=name)
while current_pos, *_ := bs.find(target_substring):
    # Read additional bits, so that bs.pos is now past current_pos
    bs.read(8)
scott-griffiths commented 2 months ago

Ah OK. The issue I believe is that find doesn't use the current bit position as its starting point. So every call to bs.find(target_substring) will return the same as it's the same call. Note that the find method is available on the Bits type, which doesn't have a concept of a bit position.

You can either:

1) Use the start parameter of find with bs.find(target, start=current_pos) (or possibly current_pos + 1 or current_pos + 8 depending on your needs). 2) Look at the readto method, which is a combination of a read and a find, and does change the current bit position.

ajahraus commented 2 months ago

Ah, yeah I understand now.

So my use case was that, even though find advances the pos field to the start of the target string, and I subsequently also called read which also advanced the pos field, find still starts at the beginning of the BitStream.

So, if I understand you correctly, this should be the solution:

bs = BitStream(filename=name)
current_pos = (0,)
while current_pos := bs.find(target_substring, start=current_pos[0]):
    # Read additional bits, so that bs.pos is now past current_pos
    bs.read(8)

Edit because you can't use tuple unpacking with an assignment expression

ajahraus commented 2 months ago

Okay, this is the solution I was originally tyring to come up with (with correct syntax, even)

while current_pos := bs.find(target_substring, start=bs.pos):
    bs.read(8)