perf(python): Improve `Series.to_numpy` performance for chunked Series that would otherwise be zero-copy

pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

https://docs.pola.rs

Other

27.22k stars 1.67k forks source link

perf(python): Improve `Series.to_numpy` performance for chunked Series that would otherwise be zero-copy #16301

Closed stinodego closed 1 month ago

stinodego commented 1 month ago

Ref #16267

Instead of iterating over the values, we rechunk and create a writable view.

This regressed in https://github.com/pola-rs/polars/pull/16178 - now we get the best of both worlds with a writable array and only a single, fast copy.

Performance of converting a chunked Series of 50 million float32s:

Before: ~60ms
After: ~25ms

I added some benchmark tests so that we may catch regressions here in the future.

codecov[bot] commented 1 month ago

Files	Patch %	Lines
py-polars/src/to_numpy.rs	98.14%	1 Missing :warning:

```diff @@ Coverage Diff @@ ## main #16301 +/- ## ========================================== - Coverage 80.78% 80.75% -0.03% ========================================== Files 1393 1393 Lines 179362 179455 +93 Branches 2921 2922 +1 ========================================== + Hits 144894 144919 +25 - Misses 33965 34033 +68 Partials 503 503 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ritchie46 commented 1 month ago

Nice. I like this approach. :)

I added some benchmark tests so that we may catch regressions here in the future.

Not entirely sure how the codspeed works, but are these tests ran by codspeed?

stinodego commented 1 month ago

Not entirely sure how the codspeed works, but are these tests ran by codspeed?

Yes! You can see it here, 3 new benchmarks: https://codspeed.io/pola-rs/polars/branches/to-np-copy-chunk

Also, I spotted a problem with this implementation for nested data, going to have another look before putting it in review again.

ritchie46 commented 1 month ago

How does one register them then? A specific folder?

stinodego commented 1 month ago

How does one register them then? A specific folder?

You just have to do @pytest.mark.benchmark. That's all. I made a dedicated benchmark folder because our tests require some data generation utilities and it's nice to have those in one place, but it doesn't have to be in there necessarily.

stinodego commented 1 month ago

I had to add an up-front check whether the Series has the right dtype / nested nulls, otherwise we could rechunk unnecessarily.

Should be good to go now, waiting for CI to turn green 🤞

rhshadrach-8451 commented 3 weeks ago

I'm seeing this raise in 0.20.27; it did not raise for me in 0.20.26. Is this expected?

pl.concat(
    [
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
        pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
    ]
).to_numpy()
# PanicException: source slice length (3) does not match destination slice length (6)

I haven't confirmed it to be from this PR, but looked likely.

rhshadrach-8451 commented 3 weeks ago

Apologies - jumped the gun here. #16288 looks more likely.

pola-rs / polars

perf(python): Improve `Series.to_numpy` performance for chunked Series that would otherwise be zero-copy #16301

Codecov Report