Open itdaniher opened 7 years ago
When you say "run-length encoding", on what kind of data? Processing arrays (like numpy-arrays or strings) is really slow in Python because the values need to be boxed when iterated over. I think it would be good to see a sample code including "typical inputs".
Hooray for optimization issue! I forgot that partitionby
hasn't been fully cythonized. I suspect that itertools.groupby
is already pretty efficient, so there probably isn't a whole lot of room for improvement. We can still try though.
To echo what both of you have said, it would be great to have sample benchmarks that reflect your use case.
Okay, 15 hours of travel later... here it goes!
(end of vacation, yak shaving is my version of solitaire)
import cytoolz as cz
import itertools as it
import numpy as np
ilen = cz.count
x = np.random.randint(0,2,size=100000,dtype='l')
crle = lambda xs: ((ilen(gp), gp[0]) for gp in cz.recipes.partitionby(lambda x: x, xs))
irle = lambda xs: ((ilen(gp), x) for x, gp in it.groupby(xs))
make sure apples == apples:
[i for i in crle(x)] == [i for i in irle(x)]
>>> True
put it to the test:
timeit.timeit('irle(x)',number=100000,globals=globals())
gives 0.03869020694401115
timeit.timeit('crle(x)',number=100000,globals=globals())
gives 0.06634469900745898
repeated tests for a few different dtypes, seems consistent.
edit: Thanks + mad props to all for your great work making fast-n-fun Python faster-n-more-fun.
edit: add other name for cz.count
which is insanely faster than len([_ for _ in x])
- the trivial case for getting the length of an iterator
hm, you're only timing how long it takes to create the generator, not how long it takes to exhaust it.
assert list(crle(x)) == list(irle(x))
# Creating the generator
%timeit crle(x) # 100000 loops, best of 3: 10.1 µs per loop
%timeit irle(x) # 100000 loops, best of 3: 3.22 µs per loop
# Iterating over the generator
%timeit list(crle(x)) # 1 loop, best of 3: 228 ms per loop
%timeit list(irle(x)) # 10 loops, best of 3: 103 ms per loop
However if you want partitionby
to be faster than replace the lambda x: x
with None
:
crle = lambda xs: ((ilen(gp), gp[0]) for gp in cz.recipes.partitionby(None, xs))
assert list(crle(x)) == list(irle(x))
%timeit list(crle(x)) # 10 loops, best of 3: 164 ms per loop
%timeit list(irle(x)) # 10 loops, best of 3: 104 ms per loop
At this point the bottleneck is that numpy-arrays are slow when iterated over, even with itertools
et. al., there are some ways to make it faster, for example converting it to a list
:
%timeit list(crle(x.tolist())) # 10 loops, best of 3: 126 ms per loop
%timeit list(irle(x.tolist())) # 10 loops, best of 3: 67.8 ms per loop
The time difference here is mostly due to the fact that groupby
doesn't need to create intermediate sequences before counting them. So the next step would be to use normal len
in partitionby
:
crle = lambda xs: ((len(gp), gp[0]) for gp in cz.recipes.partitionby(None, xs))
%timeit list(crle(x.tolist())) # 10 loops, best of 3: 89.4 ms per loop
%timeit list(irle(x.tolist())) # 10 loops, best of 3: 69.1 ms per loop
Still a bit slower but the difference isn't all that much anymore.
You're working with homogeneous array-data, so you should consider switching to a numpy-based solution instead of working in pure-Python (or using PyPy?). Just to give an example using numba
(just some draft, I don't think that function is really stable):
import numba as nb
@nb.njit
def counts(arr):
res = np.zeros((arr.size, 2), dtype=np.int32)
i = 0
cnts = 0
last = None
for item in arr:
if last is None:
last = item
if item == last:
cnts += 1
else:
res[i, 0] = cnts
res[i, 1] = last
last = item
cnts = 1
i += 1
res[i, 0] = cnts
res[i, 1] = last
i += 1
return res[:i]
assert [tuple(inner) for inner in counts(x).tolist()] == list(crle(x.tolist()))
%timeit list(crle(x.tolist())) # 10 loops, best of 3: 89.9 ms per loop
%timeit list(irle(x.tolist())) # 10 loops, best of 3: 71.8 ms per loop
%timeit counts(x) # 1000 loops, best of 3: 854 µs per loop
So this can speed up the function by a factor of 100.
I'm not sure what the takeaway message of this comment should be: I think partitionby
is really useful because it's much easier to work with than groupby
. However where groupby
has it's weird quirks it can be faster than partitionby
. But: If you're working with numeric and homogeneous data and you need performance, then you probably should stay in the numpy
-domain (maybe pandas
).
hm, you're only timing how long it takes to create the generator, not how long it takes to exhaust it.
Failure of the design of my provided test case, thanks for working through it. The observation stems from a software system that exhausts the iterator. It looks like my observation holds under tighter scrutiny, although the gap is narrowed.
But: If you're working with numeric and homogeneous data and you need performance, then you probably should stay in the numpy-domain (maybe pandas).
I've built a prototype of a soft-realtime dataprocessing system in Python. I'll rewrite it in Rust when I have nothing better to do, but surprisingly Python is more than "fast enough."
I'm not sure what the takeaway message of this comment should be
Besides that y'all are awesome? I think "check out pypy / numba / cython" is about what I got.
Cheers!
Not surprising after reading the source and discovering
partitionby
merely wrapsgroupby
, but my use of groupby is a bottleneck for run-length encoding and I was hoping to see gains by leveraging cytoolz.I welcome suggestions and would be glad to provide sample code when at my desk.
Thanks!