Closed everdark closed 1 year ago
This is a reasonable request and PR is very welcomed! We have a ticket here for reading Delta via Pandas but the priority isn't high for now. We would prefer it is contributed by the community.
Nice! Will work on this with @afaqueahmad7117.
Awesome! For reference I put the Slack's thread here, please shout if you are stuck.
Description
In
kedro-datasets
1.3.0 a new dataset dedicated for Delta Table has been added, which is based onpyspark
. Is there any plan to also add support for a non-spark implementation for Delta Lake? If not, I myself with some other folks are happy to contribute. :)Context
Not all use cases depend on Spark but they may still need to interact with Delta Lake. It won't be suitable to add Spark as a dependency simply because we want to deal with Delta Tables.
Possible Implementation
Create a custom dataset that is based on the official Delta Lake Python-binding: https://github.com/delta-io/delta-rs/tree/main/python