Open alexey-milovidov opened 1 year ago
I like where you're going, but I'm not planning to support this at the moment. We take advantage of duckDB's in-process architecture and python APIs quite a bit, so it's not totally straightforward to plug in a different database (which would require connecting over http, etc.).
You're welcome to fork the project, or if you're willing to wait, probably some day around Harlequin v2 or v3 I'll fork this and provide a general-purpose SQL client that can connect to clickhouse, postgres, etc.
There is an in-process Python module of ClickHouse, named chdb: https://github.com/chdb-io/chdb
ok, interesting. That's closer, although still missing most of the APIs we use. I also don't understand the persistence model, if there is one? i.e., I don't think you can connect to an existing clickhouse db with chdb? seems designed to mostly operate on external files
Both external and persistent tables are available. E.g., you can create a MergeTree table that will be stored locally. Or you can communicate with external ClickHouse using the remote
table function. Or read/write local and remote files.
I'm willing to assist in integrating harlequin with chdb.
That's awesome, thanks so much. Let me noodle on this a little. First step is probably me enumerating all of the duckdb integrations so we can think through alternatives.
Worth mentioning that @alexey-milovidov is the Clickhouse CTO & designer, so he can probably fix any issues that you have with integrating Clickhouse as well (missing features, bugs, etc)
ok, interesting. That's closer, although still missing most of the APIs we use. I also don't understand the persistence model, if there is one? i.e., I don't think you can connect to an existing clickhouse db with chdb? seems designed to mostly operate on external files
Hello @tconbeer chdb can connect to clickhouse-servers as well as directly access local or remote files with compatible formats. It supports session persistence using temporary or persistent local storage. The APIs are indeed are still quite basic, but this could be a great opportunity for us to find ways to extend and improve chdb feature coverage, so any input is greatly welcome!
Spent some time on this.
This wasn't in my original plan, but there isn't any reason Harlequin can't be the client to any database. The in-process part isn't very important.
I'll create an interface (an ABC or two) for a Harlequin database adapter, and refactor the DuckDB parts to be an implementation of that interface. The interface will include some optional features that may not be relevant or may not be possible for other databases, like loading extensions, exporting query results (copy to), providing autocompletions, etc. We'll distribute these adapters as separate plug-ins.
At start-up, Harlequin will discover which plugins are loaded, and then which features are available and adapt accordingly. Users will select their plugin and provide connection parameters at the command line. It would be nice to specify the connection in a config file (#206) instead (using profiles or similar).
I do want to be clear this feels like v2 to me, and that'll happen on my own timeline, with many of the smaller features shipping first. If y'all want to accelerate that, you can either start contributing to the project or you can pay me (I work as a contractor for my day job already). If you're interested in the latter, please email me (ted at shandy dot eye-oh) and we can schedule a call.
Finally, creating a delightful interface for DuckDB is still my first priority, so if I feel like this refactor puts that at risk for whatever reason, I reserve the right to change tacks and abandon this plan.
Thanks for your excitement around Harlequin -- I'm glad that it's pushed me to think about this as something bigger.
The pieces are there now, although undocumented. I'm going to implement an SQLite adapter to pressure-test the API and build out a guide for creating adapters; that should happen over the next week or two. If you want to get cracking sooner than that, you can work off the following examples:
src/harlequin/adapter.py
provides three ABCs that you need to implement: HarlequinAdapter
, HarlequinConnection
, and HarlequinCursor
src/harlequin_duckdb
provides the DuckDB implementation of HarlequinAdapter
, etc. Note that it's discovered as a plug-in via the entry-point declaration in the root pyproject.toml
file: src/harlequin_sqlite
provides the SQLite implementation.[tool.poetry.plugins."harlequin.adapter"]
duckdb = "harlequin_duckdb:DuckDbAdapter"
sqlite = "harlequin_sqlite:HarlequinSqliteAdapter"
There isn't a great, generalizable test harness (yet). I'll likely be making big changes/improvements to the tests of the project to make it easier to test other adapters as I develop the SQLite adapter. You can reference the tests in tests/adapter_tests/test_duckdb.py
; you could also swap out the fixtures in tests/conftest.py
to use different adapter fixtures.
@auxten @alexey-milovidov @lmangani this is ready for you, now.
The guide for contributing an adapter can be found here: https://harlequin.sh/docs/contributing/adapter-guide A template repo is here: https://github.com/tconbeer/harlequin-adapter-template My postgres adapter can be found as a reference here: https://github.com/tconbeer/harlequin-postgres
The template repo includes a minimal set of tests.
The guide should include everything else you need to know, but feel free to ask any questions, or we can set up a quick call.
Thanks @tconbeer this is amazing! I'll try this as soon as I get a time pocket.
is someone working on this? I could do it, but I don't want duplicate efforts :)
@dchimeno I don't think anyone started - at least not from the chdb side - so feel free!
I started to get a basic ClickHouse adapter working you can find it here harlequin-clickhouse
@tconbeer It was great to look at the harlequin-postgres
and use the adapter-template
!
This is currently using clickhouse-driver without connection pool, it doesn't really provide completions and there is a lot to improve.
@auxten i wanted to go chdb all the way and use the remote
option but this would require parsing the sql query to replace the FROM table_name
with FROM remote(host, ..., table_name)
.
I think i will create another adapter for chdb/clickhouse-local
Great to hear this! @ThomAub If any question encountered, feel free to ping me or raise an issue for chdb.
I started to get a basic ClickHouse adapter working you can find it here harlequin-clickhouse
@tconbeer It was great to look at the
harlequin-postgres
and use theadapter-template
!This is currently using clickhouse-driver without connection pool, it doesn't really provide completions and there is a lot to improve.
@auxten i wanted to go chdb all the way and use the
remote
option but this would require parsing the sql query to replace theFROM table_name
withFROM remote(host, ..., table_name)
. I think i will create another adapter for chdb/clickhouse-local
Hey, @ThomAub anything we could help on chDB here?
hey, @auxten I think I will make distinct adapter because it's not easy to make a single adpater for both chDB and ClickHouse
Yeah, chDB and ClickHouse client are quite different. If there is anything we can do to help please let me know.
On Tue, 1 Oct 2024 at 5:19 AM, Thomas @.***> wrote:
hey, @auxten https://github.com/auxten I think I will make distinct adapter because it's not easy to make a single adpater for both chDB and ClickHouse
— Reply to this email directly, view it on GitHub https://github.com/tconbeer/harlequin/issues/230#issuecomment-2384162925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB2UE4RH5PRBC2PCWJR4D3ZZG56DAVCNFSM6AAAAAA463VJE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBUGE3DEOJSGU . You are receiving this because you were mentioned.Message ID: @.***>
If there is anything we can do to help please let me know.
Do you think it should use the dbapi
mode or the Session
mode.
I know the session api will soon be improve but will the public api be the same ?
I don't know if the dbapi
and cursor
is the best way to create bot tmp and persistent session ?
I want to use this IDE with ClickHouse (which supports similar scenarios but is more powerful than DuckDB).