Make chunk_message more efficient

meshy / framewirc

An IRC toolkit built upon Python 3's asyncio module

BSD 2-Clause "Simplified" License

35 stars 4 forks source link

Make chunk_message more efficient #32

Closed meshy closed 6 years ago

meshy commented 7 years ago

It was really slow before.

to_bytes turns out to be pretty bad in a tight loop.
We jumped straight into searching one char at a time before trying to split on space. (rfind is significantly faster.)

There's a little more that can be done here before merging. Mostly, I'm not a fan of looping over the chars one at a time. It's seriously inefficient. Most of the time, we're not going to find what we're looking for in the first few hundred iterations.

A binary search would be much better. I think we can also be more efficient about the minimum and maximum bounds based on the unicode restrictions, and the max length.

Note to future self: look into the bisect module to see if it can abstract away rolling our own binary search.

meshy commented 7 years ago

I've been profiling this with a local script, but I'd be interested if anyone knows of a more formal way to write regression tests for the speed here.

meshy commented 6 years ago

Using this script:

# profile.py
from framewirc.messages import chunk_message

a = '💩失' * 1200

for i in range(10001):
    chunk_message(a, 500)

And this command:

time python ./profile.py

This PR now changes the execution time from:

python ./profile.py  12.87s user 0.01s system 99% cpu 12.875 total

to:

python ./profile.py  2.01s user 0.00s system 99% cpu 2.013 total

Before @Ian-Foote's suggestion, I had got it down to:

python ./profile.py  8.54s user 0.01s system 99% cpu 8.550 total

I'd say there's still room for improvement, but this is pretty good start on premature optimisation :)

meshy commented 6 years ago

Well... I'm probably going to be fighting hypothesis on this for a little while... :P

meshy commented 6 years ago

@Ian-Foote, thanks to hypothesis, there are a couple of extra fixes in here.

when splitting lines on spaces, the space is not removed from the end of the string.
when a message fits exactly, it is not split.

meshy commented 6 years ago

Thanks for the help! :)