narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
607 stars 90 forks source link

chroe: refactor `ArrowDataFrame.with_columns` #1345

Closed FBruzzesi closed 1 week ago

FBruzzesi commented 1 week ago

What type of PR is this? (check all applicable)

Checklist

If you have comments or can explain your changes, please do so below.

While profiling for plotly, py-spy indicates that we spend a lot of time in validate_dataframe_comparand for pyarrow case. This is called only in with_columns methods.

This PR proposed two changes:

MarcoGorelli commented 1 week ago

nice - there may be some issue on py38, but the rest looks good!

should we wait until altair / marimo's cis are fixed to merge?

FBruzzesi commented 1 week ago

Fixed old versions, what's the deal with TPCH taking so long now πŸ™ˆ?

MarcoGorelli commented 1 week ago

πŸ€” looks like yesterday it went from 2 minutes to 15 minutes

MarcoGorelli commented 1 week ago

ok it was dask, i've removed them from the tpch ci and opened an issue on their repo

FBruzzesi commented 1 week ago

did you test this against the plotly branch locally?

Yes, not big of a change as I would expect though - e.g. I would expect that np.full(1_000_000, 1) to be visibly faster than [1] * 1_000_000.

Edit: Is there a way to run it in isolation with your kaggle notebook? I can clone that

MarcoGorelli commented 1 week ago

yeah where there's pip install git+https://github.com/narwhals-dev/narwhals change that to for example pip install git+https://github.com/narwhals-dev/narwhals@perf/pyarrow-with-columns

FBruzzesi commented 1 week ago

@MarcoGorelli for 1M rows, 50 columns I cannot see any changes in performance for:

both when working with chunked array and scalars. The two approaches seem to be equivalent. I leave it up to you if this syntax is any better than the previous one

MarcoGorelli commented 1 week ago

thanks for checking! i think I prefer this one, if you agree let's ship it