timeplus-io / proton

A stream processing engine and database, and a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse
https://timeplus.com
Apache License 2.0
1.58k stars 69 forks source link

Support for Apache Arrow and ADBC (Arrow Database Connectivity) #276

Open zeroshade opened 1 year ago

zeroshade commented 1 year ago

From what I can tell, if Proton is based on / extends ClickHouse, it would make a lot of sense to add support for retrieving data in the Apache Arrow format as ClickHouse already supports using Arrow for performance reasons. This could also extend to adding support for Arrow Flight SQL (which would provide ADBC support for free).

Alongside generalized Arrow support, and the recent announcements of ODBC and JDBC drivers, it would be extremely beneficial and performant to also add an ADBC driver. To be fair, as I mentioned above, if Arrow FlightSQL support is provided, then the existing ADBC driver for Arrow Flight SQL could just be used directly (and would also allow using the existing Arrow FlightSQL JDBC/ODBC drivers.

The generalized reason for supporting ADBC is that it allows bypassing the transposition of column-oriented data to and from row-orientations just to pass across ODBC/JDBC interfaces (which are inherently row-oriented).

This could be provided either by contributing a driver to the https://github.com/apache/arrow-adbc repository, or by simply maintaining a driver here (if it is decided to not go the route of using Arrow FlightSQL). I can absolutely assist with any work involved here, though don't have the bandwidth to do it myself. I just thought it would be a useful suggestion to make here.

chenziliang commented 1 year ago

Thanks for the input. May I ask if you are using ADBC etc in your current project / prod env ? I am trying to understand how mature the ADBC framework etc is and how does the ecosystem look like.

zeroshade commented 1 year ago

@chenziliang I'm one of the developers of ADBC :smile: and wrote the Snowflake ADBC driver https://medium.com/snowflake/arrow-database-connectivity-adbc-support-for-snowflake-7bfb3a2d9074

To my knowledge, dbt is adding support for ADBC and there are a few microsoft employees who are frequently contributing to the C# ADBC libraries. Currently we have drivers for sqlite, postgres, Arrow Flight SQL (which also covers Dremio), Snowflake, and DuckDB added an implementation of the ADBC interface to it's shared object. I'll be soon working on a BigQuery driver myself.

I'd be happy to answer any questions you have regarding ADBC and Arrow in general. The ecosystem is continuing to grow rapidly (consider that the underlying memory representation for Polars is Arrow, along with pandas), and we're picking up support for more systems as we go.

jovezhong commented 10 months ago

/bounty $200

algora-pbc[bot] commented 10 months ago

💎 $200 bounty • Timeplus

Steps to solve:

  1. Start working: Comment /attempt #276 with your implementation plan
  2. Submit work: Create a pull request including /claim #276 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to timeplus-io/proton!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @ranjanmangla1 Jul 12, 2024, 3:02:30 PM WIP
🟢 @debaa98 Aug 22, 2024, 3:06:36 AM WIP
🟢 @vishwamartur Nov 18, 2024, 6:16:29 PM #856
ranjanmangla1 commented 4 months ago

@jovezhong @zeroshade can you assign me this one, would love to work on it!

jovezhong commented 4 months ago

Hi @ranjanmangla1 , you can use the attempt command (mentioned in https://github.com/timeplus-io/proton/issues/276#issuecomment-1906976828) to take the work and look forward to your PR

ranjanmangla1 commented 4 months ago

/attempt #276

jovezhong commented 4 months ago

need followed by #276

zeroshade commented 4 months ago

@ranjanmangla1 when you put the PR up, feel free to tag me on it and I'll give a review from the Arrow/ADBC perspective!

ranjanmangla1 commented 4 months ago

@zeroshade sure

debaa98 commented 3 months ago

/attempt #276

vishwamartur commented 1 week ago

/attempt #276

jovezhong commented 1 week ago

Thank you @vishwamartur for signing up this. Although there are couple of such attempt but no working PR yet. First come first serve. Feel free to let us know how we can help.

vishwamartur commented 1 week ago

Thank you, @jovezhong, for the opportunity! I’ll start working on this and will keep you updated on my progress. If I encounter any challenges or need clarification, I’ll be sure to reach out. Looking forward to contributing to the project!