kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
92 stars 89 forks source link

Support Delta Table with a non-spark implementation #226

Closed everdark closed 1 year ago

everdark commented 1 year ago

Description

In kedro-datasets 1.3.0 a new dataset dedicated for Delta Table has been added, which is based on pyspark. Is there any plan to also add support for a non-spark implementation for Delta Lake? If not, I myself with some other folks are happy to contribute. :)

Context

Not all use cases depend on Spark but they may still need to interact with Delta Lake. It won't be suitable to add Spark as a dependency simply because we want to deal with Delta Tables.

Possible Implementation

Create a custom dataset that is based on the official Delta Lake Python-binding: https://github.com/delta-io/delta-rs/tree/main/python

noklam commented 1 year ago

This is a reasonable request and PR is very welcomed! We have a ticket here for reading Delta via Pandas but the priority isn't high for now. We would prefer it is contributed by the community.

https://github.com/kedro-org/kedro-plugins/issues/159

everdark commented 1 year ago

Nice! Will work on this with @afaqueahmad7117.

noklam commented 1 year ago

Awesome! For reference I put the Slack's thread here, please shout if you are stuck.

merelcht commented 1 year ago

Completed in https://github.com/kedro-org/kedro-plugins/issues/226