pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.08k stars 1.94k forks source link

Make Python-looping through nested lists faster #8494

Open jonashaag opened 1 year ago

jonashaag commented 1 year ago

Problem description

Sometimes you need to resort to manual Python looping, eg. when integrating with legacy code.

It seems to be very slow to loop through nested Polars series:

import polars as pl
import random
import string

random.seed(42)

s = pl.Series([random.sample(string.ascii_letters, random.randint(3, 10)) for _ in range(10_000)])
s_list = s.to_list()
s_np = s.to_numpy()
s_flat = s.explode()

def loop(x):
    for l in x:
        for _ in l:
            pass

def loop_flat(x):
    for _ in x:
        pass
# Baseline with Polars
In [13]: %timeit -n 3 -r 3 loop(s)
138 ms ± 217 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

# Looping flat Polars series is a bit faster
In [18]: %timeit -n 3 -r 3 loop_flat(s_flat)
89.8 ms ± 208 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

# Looping nested Python lists is 100x faster
In [15]: %timeit -n 3 -r 3 loop(s_list)
1.02 ms ± 32.2 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)
# When including the to_list(), still 20x faster
In [20]: %timeit -n 3 -r 3 loop(s.to_list())
5.45 ms ± 239 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

# Looping nested NumPy arrays is 20x faster
In [17]: %timeit -n 3 -r 3 loop(s_np)
7.62 ms ± 50.4 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)
# When including to_numpy(), still 10x faster
In [21]: %timeit -n 3 -r 3 loop(s.to_numpy())
10 ms ± 120 µs per loop (mean ± std. dev. of 3 runs, 3 loops each)

Profile:

image

alexander-beedie commented 1 year ago

I can see a few ways to make this faster... will take a look.