scott-griffiths / bitstring

A Python module to help you manage your bits
https://bitstring.readthedocs.io/en/stable/index.html
MIT License
401 stars 67 forks source link

Significant slowdown in find / findall for some cases since version 4.1. #326

Closed tobz-nz closed 2 months ago

tobz-nz commented 3 months ago

After updating bitstring to 4.2.1 in a docker based project we started seeing a massive speed decrease. Scripts went from running in ~6 seconds to well over 4 minutes. Downgrading to 4.0.2 resolved the issue.

All versions since 4.1 in which bitstring uses bitarray are significantly slower for findall().

This issue was only present in docker containers (I tried many different images) - on a host machine or VPS it ran at normal speed.

I'm not sure if this is a bitstring ot bitarray problem, but thought I'd report it here.

scott-griffiths commented 2 months ago

Hi. Thanks for the report. The move to using bitarray has made my (limited) performance tests run significantly faster, including a test with findall, so I'm interested to see what's happening here.

I've never tested it in a docker container though - more information would be useful. Obviously if you have a code example that demonstrates the problem that would be best, but otherwise:

My best bet, as this is only happing in Docker, is that you're using a file-based bitstring (a Bits or ConstBitStream created directly from a file) and it's a file system access issue. So final question:

Thanks.

rustyrepo commented 2 months ago

Hi. Here are some more details that might help. I don't believe it is a docker issue. It was noticed in docker only because the latest version of bitstring was being used there.

The same speed differences have been noticed locally when upgrading and downgrading between 4.0.2 and 4.2.1 also when timing a single findall call that looks like this.

target_data.findall(sequence, start=0, bytealigned=True)

Where target_data is a ConstBitStream for a file of about 4.2 mb

sequence is created from a list of numbers as Bits using Bits().join(numberList)

Timings for a single findall call on the 4.2 MB file

-- findall for a sequence representing 11 numbers

Using bitstring 4.0.2 Timer: 0.003888 seconds

Using bitstring 4.2.1 Timer: 0.128739 seconds

-- findall for a much larger sequence representing 875 numbers

Using bitstring 4.0.2 Timer: 0.009793 seconds

Using bitstring 4.2.1 Timer: 0.106768 seconds

scott-griffiths commented 2 months ago

Ah thanks, that does look like a bigger problem. I'll try to reproduce it when I get a chance, but possibly if you change from a ConstBitStream to a BitStream that might help, as it will read the file into memory, and I suspect this is a file access issue as otherwise I would have seen it before now.

scott-griffiths commented 2 months ago

I've run some tests on findall, and I can reproduce the issue.

Test 1: bytealigned = True, finding whole byte 4.0.2: 0.2 s 4.2.1: 1.3 s <---- much slower !!!

Test 2: bytealigned = True, finding non-whole byte 4.0.2: 22.9 s 4.2.1: 1.3 s

Test 3: bytealigned = False, finding whole byte 4.0.2: 14.6 s 4.2.1: 1.4 s

Test 4: bytealigned = False, finding non-whole byte 4.0.2: 25.1 s 4.2.1: 1.4 s

So for three of the four cases the new bitstring version is ~15x faster. For the other case the old bitstring is ~7x faster.

I'm assuming that your case is finding a whole-byte bitstring, so it's the case that has been made slower (you have much worse than 7x slow down, but I'm sure it's the same issue).

I'm pretty sure I know what's happened here, and it should be fixable for the 4.2.2 release.

scott-griffiths commented 2 months ago

The new code is 2 or 3x faster than the 4.0.2 code on my machine, rather than 7x slower, so hopefully it should speed up your case.

Planning a point release this weekend if possible.

scott-griffiths commented 2 months ago

bitstring 4.2.2 is now released. 🎉

tobz-nz commented 2 months ago

Can confirm it slightly faster on average than 4.0.2 in our use-case 👍 Nice job - thanks 😄