timeplus-io / proton

A streaming SQL engine, a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse.
https://timeplus.com
Apache License 2.0
1.37k stars 50 forks source link

Support for Apache Arrow and ADBC (Arrow Database Connectivity) #276

Open zeroshade opened 7 months ago

zeroshade commented 7 months ago

From what I can tell, if Proton is based on / extends ClickHouse, it would make a lot of sense to add support for retrieving data in the Apache Arrow format as ClickHouse already supports using Arrow for performance reasons. This could also extend to adding support for Arrow Flight SQL (which would provide ADBC support for free).

Alongside generalized Arrow support, and the recent announcements of ODBC and JDBC drivers, it would be extremely beneficial and performant to also add an ADBC driver. To be fair, as I mentioned above, if Arrow FlightSQL support is provided, then the existing ADBC driver for Arrow Flight SQL could just be used directly (and would also allow using the existing Arrow FlightSQL JDBC/ODBC drivers.

The generalized reason for supporting ADBC is that it allows bypassing the transposition of column-oriented data to and from row-orientations just to pass across ODBC/JDBC interfaces (which are inherently row-oriented).

This could be provided either by contributing a driver to the https://github.com/apache/arrow-adbc repository, or by simply maintaining a driver here (if it is decided to not go the route of using Arrow FlightSQL). I can absolutely assist with any work involved here, though don't have the bandwidth to do it myself. I just thought it would be a useful suggestion to make here.

chenziliang commented 7 months ago

Thanks for the input. May I ask if you are using ADBC etc in your current project / prod env ? I am trying to understand how mature the ADBC framework etc is and how does the ecosystem look like.

zeroshade commented 7 months ago

@chenziliang I'm one of the developers of ADBC :smile: and wrote the Snowflake ADBC driver https://medium.com/snowflake/arrow-database-connectivity-adbc-support-for-snowflake-7bfb3a2d9074

To my knowledge, dbt is adding support for ADBC and there are a few microsoft employees who are frequently contributing to the C# ADBC libraries. Currently we have drivers for sqlite, postgres, Arrow Flight SQL (which also covers Dremio), Snowflake, and DuckDB added an implementation of the ADBC interface to it's shared object. I'll be soon working on a BigQuery driver myself.

I'd be happy to answer any questions you have regarding ADBC and Arrow in general. The ecosystem is continuing to grow rapidly (consider that the underlying memory representation for Polars is Arrow, along with pandas), and we're picking up support for more systems as we go.

jovezhong commented 5 months ago

/bounty $200

algora-pbc[bot] commented 5 months ago

💎 $200 bounty • Timeplus

Steps to solve:

  1. Start working: Comment /attempt #276 with your implementation plan
  2. Submit work: Create a pull request including /claim #276 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

Thank you for contributing to timeplus-io/proton!

Add a bounty • Share on socials