Closed lhegstrom closed 1 year ago
Thanks for the suggestion @lhegstrom! I'm a big fan of Polars. Would you like to try contributing this? Otherwise I'll be happy to take care of it myself.
@astrojuanlu I'd be happy to take a stab at implementing this. I probably won't be able to do much before the weekend, ut will follow-up
Sorry @lhegstrom, didn't realize there's https://github.com/kedro-org/kedro-plugins/pull/95 already!
Hi guys! The #95 already is ready to merge (still lacks 1 review), but it's only the polars.CSVDataset, there's still the others formats to contribute! About the pandas/NumPy interop, there's no need to worry, polars API already has a "to_pandas" and "to_numpy" methods. @astrojuanlu @lhegstrom
Current effort is in gh-170
Description
Polars is a dataframe library written in Rust with a python API that is gaining a lot of traction. It has ~13k stars on github, and numerous blog posts are appearing on its performance.
Context
It certainly feels like it's only a matter of time before this library becomes one of the "standard" data analysis tools - like dask, spark, and others have become. Having an out of the box implementation in Kedro would allow organizations that are beginning to implement this library to have a clean switch without having to implement a custom, AbstractDataSet.
Possible Implementation
Reading, and writing CSVs, parquet and others in Polars seems relatively straight forward. One concern would be transcoding between pandas -> Polars, or Polars to Pandas, polar to dask, etc. may require a little more thought as Polars does not use an index.