mtxr / vscode-sqltools

Database management for VSCode
https://vscode-sqltools.mteixeira.dev?utm_source=github&utm_medium=homepage-link
MIT License
1.44k stars 291 forks source link

Databricks support #920

Closed davehowell closed 1 year ago

davehowell commented 1 year ago

Is your feature request related to a problem? Please describe. No driver for connecting to Databricks SQL

Describe the solution you'd like I would like to work on this - adding this ticket to check noone else is already doing this or if they are to see if efforts can be combined

Describe alternatives you've considered Using DBeaver or other JDBC SQL clients - can't do this for a particular client as they only want to support VSCode for their data developers.

Additional context

I read the Support new Drivers docs which were helpful, used the template https://github.com/mtxr/vsc-sqltools-driver-template

So far I've installed nvm, node.js, run corepack activate to get yarn ( had many issues trying to use npm and I notice this main project is using yarn so going with that) edited the package.json, and trying to build/debug.

I am working through the required hooks in the src/extension.js file, referring to existing drivers as examples but instead using the databricks-sql-nodejs driver.

Some additional features I'd like to have would be using the same dotfile/profile for connection string details that the databricks CLI uses ( btw the postgreSQL driver could also benefit from preferencing the ~/.pg_service.conf file for connection details)

I also see other drivers have issues with too many rows returned and suggest users always append LIMIT 1000 - I would like to always wrap queries with that. And also consider the : for delimiting multiple queries in some way.

gjsjohnmurray commented 1 year ago

How is your driver development going? If / when it is published let us know and we'll add a link to it in the README and online documentation of the main extension.

davehowell commented 1 year ago

@gjsjohnmurray hey thanks for following up on this.

It's in progress https://github.com/davehowell/sqltools-databricks-driver/tree/databricks-driver . I tried to do some async await all for the query execution then realized that queries on a page could be a script with some ordering of execution implied, so backtracked from there to using a for loop. I have the config for the connection string settings all working, and connection test working, I have chosen to be opinionated with this and only support the SQL Warehouse feature of Databricks, not the classic spark clusters, because the endpoints for SQL Warehouses are active even when the clusters are switched off - I think for a SQL client use case it is the most reasonable approach. I can launch VSCode debug, set up the connection with an endpoint http_path and a token, and get it to run queries and display in the grid. Multiple queries behave strangely, it works, but if you switch between grid results there is an error about a width value being supplied incorrectly. I am guessing it is related to the names of the columns in the grids and column widths changing as you switch. I can give more info on that if it's something you are aware of.

I am working through the metadata queries at present. Classic Databricks has no information_schema so fetching database/table/column metadata is via commands like show databases and describe table foo which return a different set of fields to the expected interface. The underlying SDK I am working with also has API calls for that but I've been advised they are fairly slow, lots of API calls under the hood happen - part of it's legacy as deriving from a hive/thrift driver. That project is under active development so some breaking changes. A newer feature of Databricks - Unity Catalog - is in public preview now and does include information_schema views but I won't add that right now. After the metadata queries there is the code-completion and keywords bits and then I am done for a 0.1 release.

gjsjohnmurray commented 1 year ago

@davehowell thanks for the update. Looks like you're making great progress. Incorrect width value doesn't ring any bells with me, so I'm pinging @mtxr in case some ring for him,

davehowell commented 1 year ago

That issue #988 is exactly the same width value error I was seeing. Glad that it's not something caused by me doing it wrong. Using a single query-per-block is a fine workaround so I won't worry about. I haven't progressed on this for a few weeks but coming back to finish it soon, work permitting.

KaduUlson commented 1 year ago

Hello @davehowell! Any more progress in this Databricks Driver? Would be happy to help. Tried to install it via VSCode Markeplace but could not find it.

davehowell commented 1 year ago

Hi @KaduUlson I haven't published it, I think that without the metadata tree working it's a bit too raw. I am wondering, do you use "Unity Catalog"? Adding support just for Unity Catalog might be an easier short-term way to get it to a publishable state. TBH I'd prefer not to support the legacy hive metastore catalog, mostly because the API for it is painful and slow. Unity Catalog has the ISO/ANSI standard "information schema" that the SQL Tools core already supports. I will revisit it in the next few days.

davehowell commented 1 year ago

This work I was doing is redundant. An official Databricks developed version of this now exists and is published to vscode extensions: https://github.com/databricks/sqltools-databricks-driver

@fjakobs My work on this is probably out of date and not useful, but my WIP is here: https://github.com/davehowell/sqltools-databricks-driver I got it working for queries using Databricks SQL (endpoints) but not for the catalog browser. I see you've upgraded the dependency on @databricks/sql to GA version, great news that is out 🎉. I was working with a beta at the time which was going through radical refactoring away from its hive/thrift origins, and I didn't think it made sense to support the old connection method. I also see you have good tests and integration tests so it's a lot further progressed than I was. Even still, I'm happy to contribute to the official project if you want some help.

fjakobs commented 1 year ago

Hi @davehowell, I actually wasn't aware that you were also working on a Databricks driver. I'd be happy if you decided to join forces and contribute to our extension.

fjakobs commented 1 year ago

This ticket can be closed as there is now a Databricks driver.