kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
92 stars 89 forks source link

Adding Polars' dataframe as a dataset #110

Closed lhegstrom closed 1 year ago

lhegstrom commented 1 year ago

Description

Polars is a dataframe library written in Rust with a python API that is gaining a lot of traction. It has ~13k stars on github, and numerous blog posts are appearing on its performance.

Context

It certainly feels like it's only a matter of time before this library becomes one of the "standard" data analysis tools - like dask, spark, and others have become. Having an out of the box implementation in Kedro would allow organizations that are beginning to implement this library to have a clean switch without having to implement a custom, AbstractDataSet.

Possible Implementation

Reading, and writing CSVs, parquet and others in Polars seems relatively straight forward. One concern would be transcoding between pandas -> Polars, or Polars to Pandas, polar to dask, etc. may require a little more thought as Polars does not use an index.

astrojuanlu commented 1 year ago

Thanks for the suggestion @lhegstrom! I'm a big fan of Polars. Would you like to try contributing this? Otherwise I'll be happy to take care of it myself.

lhegstrom commented 1 year ago

@astrojuanlu I'd be happy to take a stab at implementing this. I probably won't be able to do much before the weekend, ut will follow-up

astrojuanlu commented 1 year ago

Sorry @lhegstrom, didn't realize there's https://github.com/kedro-org/kedro-plugins/pull/95 already!

wmoreiraa commented 1 year ago

Hi guys! The #95 already is ready to merge (still lacks 1 review), but it's only the polars.CSVDataset, there's still the others formats to contribute! About the pandas/NumPy interop, there's no need to worry, polars API already has a "to_pandas" and "to_numpy" methods. @astrojuanlu @lhegstrom

astrojuanlu commented 1 year ago

xref https://github.com/kedro-org/kedro-plugins/pull/116

astrojuanlu commented 1 year ago

Current effort is in gh-170