Open nixent opened 8 months ago
Thanks for the suggestion, I'm not too familiar with Delta lake yet. Looks interesting, will add for consideration.
Links:
If you need read or write to delta, there are only a few projects that do that -> https://delta.io/integrations. Your only real options are spark or Java.
Also delta-rs for Rust and Python:
My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.
My biggest issue with delta lake is that they typically only support unity catalog and no instructions to storage on s3 compared to iceberg or hudi.
Hi, not sure what you mean by unity catalog, but delta lake is just an extension over parquet. As @danielgafni mentioned, there is https://github.com/delta-io/delta-rs which doesn't require you to have spark
.
@alberttwong that's a databricks thing and totally unrelated to deltalake.
@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.
@XBeg9 delta lake isn't enough to used by SQL query engine. Both StarRocks and trino need delta lake files to be registered in a metadata catalog like hms. Unfortunately most delta lake integrations only support unity catalog. It doesn't help that metadata catalog are the new project/vendor lock in.
@XBeg9 by the way, https://github.com/delta-io/kafka-delta-ingest/issues/166 doesn't support new delta lake table creation.
@ion-elgreco the problem is that the top 30 committers to delta lake are databricks employees. https://tableformats.sundeck.io/. For all purposes, it's a single vendor OSS project with few commits (accepted or otherwise) from anyone else.
Spark-delta is. Delta-rs isn't
Spark-delta is. Delta-rs isn't
It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.
Spark-delta is. Delta-rs isn't
It's more of the delta lake core project itself. Maybe the delta lake integrations have more diversity, like you mentioned.
Thank you for sharing your thoughts. I appreciate your insights, but I'd like to clarify that our main focus here is on Slingdata's capability to read delta tables independently of Spark. I'm particularly interested in understanding this aspect without involving HMS, Unity Catalog, or Databricks integrations at the moment. Could we perhaps steer our discussion back to that specific topic?
Is there any plans to add support for Delta lake tables?