splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
392 stars 9 forks source link

Investigate support for Delta Tables as backing storage #175

Closed rupurt closed 1 year ago

rupurt commented 1 year ago

Howdy,

Are there any plans to support Delta Tables? This could work really well with GraphQL subscriptions.

mildbyte commented 1 year ago

Hey! Do you mean being able to support DataBricks' Delta Tables / Delta Lake (https://github.com/delta-io/delta/blob/master/PROTOCOL.md) as a storage backend / data source for CREATE EXTERNAL tables?

Design-wise, a GraphQL frontend is a sweet idea, though I'm not sure how to make it work well for analytical/aggregation queries (e.g. being able to represent a group by or window on arbitrary columns as a set of supported GraphQL fields). Same with subscriptions -- how would you quickly update a result for AVG(volume) GROUP BY country_id? IIRC ClickHouse/Materialize did some heavy research in that direction -- would indeed be cool to have it also available to Web devs via GQL :)

rupurt commented 1 year ago

Yes exactly as a storage backend / data source for CREATE EXTERNAL tables

Design wise I'm not exactly sure how to implement the subscription :) But I feel like there is so much work going into this problem that the solution is right on the cusp of being implemented (e.g. ClickHouse/Materialize/Delta Tables). FWIW there is now a Delta Table implementation in rust and it can do streaming updates https://github.com/delta-io/delta-rs/tree/main/rust.

gruuya commented 1 year ago

@rupurt thanks for the very cool ideas! :)

As for using Delta tables for our storage backend/layer (i.e. replacing our DIY lakehouse protocol with the Delta one using delta-rs), this is something that we'll likely converge towards at some point later on.

For now though, with the latest Seafowl version (0.2.10) you should be able to instantiate the delta tables stored in various cloud object stores as an external table (will be placed in the staging schema) and query them.

rupurt commented 1 year ago

Amazing. Thank you @gruuya

gruuya commented 1 year ago

As for using Delta tables for our storage backend/layer (i.e. replacing our DIY lakehouse protocol with the Delta one using delta-rs), this is something that we'll likely converge towards at some point later on.

I'm happy to say that we've completed this migration, so this issue can be closed now. Thanks for a great idea @rupurt !