slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
299 stars 16 forks source link

Delta lake support #26

Open nixent opened 8 months ago

nixent commented 8 months ago

Is there any plans to add support for Delta lake tables?

flarco commented 8 months ago

Thanks for the suggestion, I'm not too familiar with Delta lake yet. Looks interesting, will add for consideration.

Links:

alberttwong commented 4 months ago

If you need read or write to delta, there are only a few projects that do that -> https://delta.io/integrations. Your only real options are spark or Java.

danielgafni commented 2 months ago

Also delta-rs for Rust and Python:

https://github.com/delta-io/delta-rs

alberttwong commented 2 months ago

My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.

XBeg9 commented 2 months ago

My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.

Hi, not sure what you mean by unity catalog, but delta lake is just an extension over parquet. As @danielgafni mentioned, there is https://github.com/delta-io/delta-rs which doesn't require you to have spark.

ion-elgreco commented 2 months ago

@alberttwong that's a databricks thing and totally unrelated to deltalake.

alberttwong commented 2 months ago

@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.

alberttwong commented 2 months ago

@XBeg9 delta lake isn't enough to used by SQL query engine. Both StarRocks and trino need delta lake files to be registered in a metadata catalog like hms. Unfortunately most delta lake integrations only support unity catalog. It doesn't help that metadata catalog are the new project/vendor lock in.

alberttwong commented 2 months ago

@XBeg9 by the way, https://github.com/delta-io/kafka-delta-ingest/issues/166 doesn't support new delta lake table creation.

ion-elgreco commented 2 months ago

@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.

Spark-delta is. Delta-rs isn't

alberttwong commented 2 months ago

Spark-delta is. Delta-rs isn't

It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.

XBeg9 commented 2 months ago

Spark-delta is. Delta-rs isn't

It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.

Thank you for sharing your thoughts. I appreciate your insights, but I'd like to clarify that our main focus here is on Slingdata's capability to read delta tables independently of Spark. I'm particularly interested in understanding this aspect without involving HMS, Unity Catalog, or Databricks integrations at the moment. Could we perhaps steer our discussion back to that specific topic?

nixent commented 1 month ago

@flarco fyi, another implementation of delta-go