kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
87 stars 77 forks source link

`kedro-datasets`: Cleanup of dependency (groups are incomplete, old dependencies should be deleted) #747

Open astrojuanlu opened 3 weeks ago

astrojuanlu commented 3 weeks ago

Description

Looks like the spark, polars and maybe other dependency groups are incomplete. For example, kedro-datasets[spark]:

https://github.com/kedro-org/kedro-plugins/blob/92bf6ebaa249c9e144fc6fee18a83b76973c13f2/kedro-datasets/pyproject.toml#L143

I guess it should be

spark = ["kedro-datasets[spark-deltatabledataset,spark-sparkdataset,spark-sparkhivedataset,spark-sparkjdbcdataset]"]

Similar thing happens with [polars]. Haven't checked them all.

Context

It's been a while since we moved from setup.py to pyproject.toml and it looks like some of the dependencies are out of date as well e.g. polars-genericdataset doesn't exist anymore.

On top of checking for completeness, the requirements should be updated and cleaned:

ankatiyar commented 2 weeks ago
polars-csvdataset = ["kedro-datasets[polars-base]"]
polars-eagerpolarsdataset = ["kedro-datasets[polars-base]", "pyarrow>=4.0", "xlsx2csv>=0.8.0", "deltalake >= 0.6.2"]
polars-genericdataset = ["kedro-datasets[polars-base]", "pyarrow>=4.0", "xlsx2csv>=0.8.0", "deltalake >= 0.6.2"]
polars-lazypolarsdataset = ["kedro-datasets[polars-base]", "pyarrow>=4.0", "deltalake >= 0.6.2"]
polars = ["kedro-datasets[polars-genericdataset]"]
spark-deltatabledataset = ["kedro-datasets[spark-base,hdfs-base,s3fs-base]", "delta-spark>=1.0, <3.0"]
spark-sparkdataset = ["kedro-datasets[spark-base,hdfs-base,s3fs-base]"]
spark-sparkhivedataset = ["kedro-datasets[spark-base,hdfs-base,s3fs-base]"]
spark-sparkjdbcdataset = ["kedro-datasets[spark-base,hdfs-base,s3fs-base]"]
spark = ["kedro-datasets[spark-deltatabledataset]"]

I think I did it this way when I did the migration because for spark, spark-deltatabledataset covered all the requirements for all the datasets in that group and similar for polars but I suppose if the individual requirements for the datasets were to change at any point, this would be a problem.

astrojuanlu commented 2 weeks ago

This might be preventing out-of-the-box installation of the databricks-iris starter