Closed tobz-nz closed 2 months ago
Hi. Thanks for the report. The move to using bitarray has made my (limited) performance tests run significantly faster, including a test with findall
, so I'm interested to see what's happening here.
I've never tested it in a docker container though - more information would be useful. Obviously if you have a code example that demonstrates the problem that would be best, but otherwise:
findall
happening in a file based Bits
or ConstBitStream
object? These aren't read into memory so have different performance characteristics.bytealigned
being set to True
)?My best bet, as this is only happing in Docker, is that you're using a file-based bitstring (a Bits
or ConstBitStream
created directly from a file) and it's a file system access issue. So final question:
findall
on a file, is that file inside the Docker container or on a mapped volume outside it?Thanks.
Hi. Here are some more details that might help. I don't believe it is a docker issue. It was noticed in docker only because the latest version of bitstring was being used there.
The same speed differences have been noticed locally when upgrading and downgrading between 4.0.2 and 4.2.1 also when timing a single findall
call that looks like this.
target_data.findall(sequence, start=0, bytealigned=True)
Where target_data
is a ConstBitStream
for a file of about 4.2 mb
sequence
is created from a list of numbers as Bits using Bits().join(numberList)
Timings for a single findall
call on the 4.2 MB file
-- findall
for a sequence representing 11 numbers
Using bitstring 4.0.2 Timer: 0.003888 seconds
Using bitstring 4.2.1 Timer: 0.128739 seconds
-- findall
for a much larger sequence representing 875 numbers
Using bitstring 4.0.2 Timer: 0.009793 seconds
Using bitstring 4.2.1 Timer: 0.106768 seconds
Ah thanks, that does look like a bigger problem. I'll try to reproduce it when I get a chance, but possibly if you change from a ConstBitStream
to a BitStream
that might help, as it will read the file into memory, and I suspect this is a file access issue as otherwise I would have seen it before now.
I've run some tests on findall, and I can reproduce the issue.
Test 1: bytealigned = True, finding whole byte 4.0.2: 0.2 s 4.2.1: 1.3 s <---- much slower !!!
Test 2: bytealigned = True, finding non-whole byte 4.0.2: 22.9 s 4.2.1: 1.3 s
Test 3: bytealigned = False, finding whole byte 4.0.2: 14.6 s 4.2.1: 1.4 s
Test 4: bytealigned = False, finding non-whole byte 4.0.2: 25.1 s 4.2.1: 1.4 s
So for three of the four cases the new bitstring version is ~15x faster. For the other case the old bitstring is ~7x faster.
I'm assuming that your case is finding a whole-byte bitstring, so it's the case that has been made slower (you have much worse than 7x slow down, but I'm sure it's the same issue).
I'm pretty sure I know what's happened here, and it should be fixable for the 4.2.2 release.
The new code is 2 or 3x faster than the 4.0.2 code on my machine, rather than 7x slower, so hopefully it should speed up your case.
Planning a point release this weekend if possible.
bitstring 4.2.2 is now released. 🎉
Can confirm it slightly faster on average than 4.0.2 in our use-case 👍 Nice job - thanks 😄
After updating bitstring to 4.2.1 in a docker based project we started seeing a massive speed decrease. Scripts went from running in ~6 seconds to well over 4 minutes. Downgrading to 4.0.2 resolved the issue.
All versions since 4.1 in which bitstring uses bitarray are significantly slower for findall().
This issue was only present in docker containers (I tried many different images) - on a host machine or VPS it ran at normal speed.
I'm not sure if this is a bitstring ot bitarray problem, but thought I'd report it here.