marsupialtail / quokka

Making data lake work for time series
https://marsupialtail.github.io/quokka/
Apache License 2.0
1.1k stars 60 forks source link

Data source challenge: SAP HANA #18

Open tomtom215 opened 1 year ago

tomtom215 commented 1 year ago

Due to it being a proprietary DB that doesn't speak mysql or Postgres dialects, and is a columnar store - I'd love to see the ability to use Quokka to query SAP HANA

marsupialtail commented 1 year ago

@tomtom215 I though SAP HANA supports JDBC?

tomtom215 commented 1 year ago

It does, I opened the issue because I didn't see any JDBC config details in the documentation and saw on your roadmap you were working on a SQL interface: https://marsupialtail.github.io/quokka/search.html?q=jdbc

In your opinion - if a DB supports JDBC, do feel it should be supported by Quokka?

marsupialtail commented 1 year ago

So if a DB supports JDBC, can't you just query it directly with SQL? Sorry I am misunderstanding something here.

tomtom215 commented 1 year ago

Yes and no due to SQL dialects. For example DataGrip is a great Database IDE but you can't just drop in the SAP HANA JDBC driver and everything works, support had to be added in by Jetbrains. Similar in that SQLalchemy will get you only so far for certain DB's with special SQL dialects. However, if you think I'm way off and you believe everything should work, I'll see if I can setup a proof of concept in the next month or so (holidays makes it hard to schedule) with Quokka

tomtom215 commented 1 year ago

Another reason why I wanted to ask is the example of Clickhouse JDBC Bridge. In theory, it should allow you to connect to any JDBC capable DB, but it has clear limitations:

marsupialtail commented 1 year ago

Let me do more research into JDBC and its limitations. In an ideal world everything will be Arrow Flight based and life will be amazing but we are quite far off from that. We can keep discussing in this thread.

Adding a JDBC driver is a great way to contribute to Quokka. Happy to work on it with you.

marsupialtail commented 1 year ago

Thank you for all the information.

marsupialtail commented 1 year ago

@tomtom215 do you know if JDBC tells you how big a SAP HANA table is? I imagine you can query select count (*) to figure that out but I wonder how performant that is. That should return instantly in a good implementation but perhaps not.

This is quite important in Quokka as Quokka is moving towards an architecture where all the data sources are bounded, so we can do better progress tracking during execution.

marsupialtail commented 1 year ago

Quokka will support ODBC soon.