kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
92 stars 89 forks source link

Polars SQL datasets #853

Open AntonNikishin opened 3 weeks ago

AntonNikishin commented 3 weeks ago

Description

It would be great to have Polars implementations of SQLQueryDataset and SQLTableDataset, similar to the Pandas versions: pandas.SQLTableDataset and pandas.SQLQueryDataset.

Context

Sometimes users would like to read / write polars DataFrames directly from SQL databases.

Possible Implementation

The datasets will have similar implementation to Pandas versions, but will use polars built-in functions read_database and write_database.

P.S. I'm happy to work on that ☺️

noklam commented 3 weeks ago

Would the ibis dataset already support polar as a backend?

deepyaman commented 2 weeks ago

Would the ibis dataset already support polar as a backend?

It does, but:

  1. I'm guessing the read_database would need to be implemented in Ibis.
  2. If a user just wants to use Polars syntax in their nodes, I guess it's a fair ask.

It's a separate question whether Polars is the best way to manipulate data in a database (definite downside is pulling it into memit. For manipulation, rather than pushing down compute), but a user may still want to do it.

deepyaman commented 2 weeks ago

I would recommend to create polars.DatabaseDataset instead of mirroring the pandas datasets, because:

  1. Polars provides symmetrical read and write methods.
  2. SQL is less explicit, because Polars SQL is also a thing.