koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 118 forks source link

feat: bump narwhals and adapt to support pyarrow #694

Closed FBruzzesi closed 3 months ago

FBruzzesi commented 3 months ago

Description

Long story short: I knew that some functionalities would have not worked for pyarrow even if using narwhals and we had to adjust accordingly.

For example we cannot create a series using native_namespace.Series([...]), because pyarrow doesn't have a Series object, therefore the a workaround was needed:

- series = native_namespace.Series([...])
+ series = nw.from_dict({"_tmp": [...]})["_tmp"]

Namely, create a narhwals dataframe for the given namespace and then select the unique column.

A bump of other changes were needed in the tests to assess with pyarrow tables.

Type of change

Checklist:

Other comments

koaning commented 3 months ago

Oh, and one more thing, must we bump narwhals? I usually prefer not to force the user to use the latest and greatest dependency, but it seems like we need it to support more dataframe types?

FBruzzesi commented 3 months ago

Oh, and one more thing, must we bump narwhals? I usually prefer not to force the user to use the latest and greatest dependency, but it seems like we need it to support more dataframe types?

Sadly yes, if we want to have pyarrow support for grouped meta and shift, these functionalities made it only in the latest release (yet narwhals keeps being dependency-free)

FBruzzesi commented 3 months ago

LGTM! Might be good to prep another release?

Sounds good! We went a bit silent, but now have some breaking changes from #693