splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
388 stars 9 forks source link

Add Arrow Flight frontend #477

Closed gruuya closed 6 months ago

gruuya commented 7 months ago

Currently Seafowl provides two interfaces, the HTTP frontend and the PG endpoint. While they're fine for summarized (i.e. aggregated) data output for end users directly or via a web app, they fall short when it comes to transferring large result sets.

Putting aside the fact that not all Arrow types are supported on the currently existing frontends (e.g. see #393), returning large data sets will most likely be highly inefficient due to the overhead of converting the internal columnar representation into row-based JSON response (for the HTTP frontend) for non-trivial row counts. The particular scenario there would involve an external DB system which uses Seafowl for analytical workloads, but can't push down the entire query in some cases so it must fetch the underlying data to perform the original query itself.

Introducing the Arrow Flight (Arrow Fligt SQL in particular[1]) frontend here would solve this problem, since it provides a protocol for sending Arrow data, and would thus avoid unnecessary serialization.

[1] https://voltrondata.com/resources/apache-arrow-flight-sql-arrow-for-every-database-developer