Closed stinodego closed 1 month ago
Attention: Patch coverage is 98.24561%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 80.75%. Comparing base (
6d48c11
) to head (319d255
). Report is 8 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
py-polars/src/to_numpy.rs | 98.14% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Nice. I like this approach. :)
I added some benchmark tests so that we may catch regressions here in the future.
Not entirely sure how the codspeed works, but are these tests ran by codspeed?
Not entirely sure how the codspeed works, but are these tests ran by codspeed?
Yes! You can see it here, 3 new benchmarks: https://codspeed.io/pola-rs/polars/branches/to-np-copy-chunk
Also, I spotted a problem with this implementation for nested data, going to have another look before putting it in review again.
How does one register them then? A specific folder?
How does one register them then? A specific folder?
You just have to do @pytest.mark.benchmark
. That's all. I made a dedicated benchmark folder because our tests require some data generation utilities and it's nice to have those in one place, but it doesn't have to be in there necessarily.
I had to add an up-front check whether the Series has the right dtype / nested nulls, otherwise we could rechunk unnecessarily.
Should be good to go now, waiting for CI to turn green 🤞
I'm seeing this raise in 0.20.27; it did not raise for me in 0.20.26. Is this expected?
pl.concat(
[
pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
pl.DataFrame({"a": [1, 1, 2], "b": [2, 3, 4]}),
]
).to_numpy()
# PanicException: source slice length (3) does not match destination slice length (6)
I haven't confirmed it to be from this PR, but looked likely.
Apologies - jumped the gun here. #16288 looks more likely.
Ref #16267
Instead of iterating over the values, we rechunk and create a writable view.
This regressed in https://github.com/pola-rs/polars/pull/16178 - now we get the best of both worlds with a writable array and only a single, fast copy.
Performance of converting a chunked Series of 50 million float32s:
I added some benchmark tests so that we may catch regressions here in the future.