quarylabs / quary

Open-source BI for engineers
https://www.quary.dev
Apache License 2.0
2.14k stars 48 forks source link

Connection for Dremio Lakehouse #427

Open jdbodyfelt opened 1 month ago

jdbodyfelt commented 1 month ago

Please Describe The Problem To Be Solved

The Problem: Other than Snowflake, there is a lack of connectors to other lakehouse solutions. While a Databricks connector would be nice for many corporate production runs, in the interest of open source, a Dremio connector might be more appreciated by the community. This request is to build a Dremio connector for Quary.

Optional: Suggest A Solution

Looking into the code architecture, it seems that the bulk of connectors are maintained within rust/quary-databases/src/databases_<flavor>.rs and rust/core/src/database_<flavor>.rs. Inspection shows a common class interface already designed across both. For Dremio, there are a number of protocols available including REST, JDBC, & ODBC. However, with a RUST build, it may be advantageous to use the ARROW Flight protocol as Dremio highly support it - can lead to 20X speed-up over JDBC. *In fact, could even extend this issue to a generic "Arrow Flight Connector" type.

A possible plan includes:

  1. Review full connection interface by building "skeleton" version of rust/quary-databases/src/databases_dremio.rs.
  2. Review Dremio docs, although may not be needed if just functional Arrow SQL.
  3. Get feedback on any other requirements to implement a Dremio connector - thought I saw something else in SQL interfacing code.
  4. Design for any other feedback (e.g. any other needed *_dremio.rs files).
  5. Unit test.
  6. Review & release into the wild.

Happy to help on this to build out my Rust expertise...

benfdking commented 1 month ago

Thanks for this! We'll have a quick look into this today!

benfdking commented 1 month ago

Here's a first draft https://github.com/quarylabs/quary/pull/446, it doesn't work and still needs filling in quite a bit but I think it should give you the general structure.

There are a few things to add:

I mostly did it out of curiosity for:

  1. Dremio which is cool!
  2. ArrowFlight: We have some translation layers and I am wondering whether quary's internal format should just be arrow.
jdbodyfelt commented 1 month ago

Love the idea of internal format of Arrow - it looks very sweet!

jdbodyfelt commented 1 month ago

I'll pull the branch and try to get some feedback by Wed

benfdking commented 1 month ago

There's a first draft with this it in being pushed at the moment, it works with username/password/nossl

            let host = env::var("DREMIO_HOST")
                .map_err(|_| "DREMIO_HOST must be set to connect to Dremio".to_string())?;
            let port = env::var("DREMIO_PORT")
                .map_err(|_| "DREMIO_PORT must be set to connect to Dremio".to_string())?;
            let use_ssl = env::var("DREMIO_USE_SSL")
                .map_err(|_| "DREMIO_USE_SSL must be set to connect to Dremio".to_string())?;
            let username = env::var("DREMIO_USER")
                .map_err(|_| "DREMIO_USER must be set to connect to Dremio".to_string())?;
            let password = env::var("DREMIO_PASSWORD")
                .map_err(|_| "DREMIO_PASSWORD must be set to connect to Dremio".to_string())?;

            let auth = if let Ok(personal_access_token) = env::var("DREMIO_PERSONAL_ACCESS_TOKEN") {
                DremioAuth::UsernamePersonalAccessToken(username, personal_access_token)
            } else {
                DremioAuth::UsernamePassword(username, password)
            };

            let database = crate::databases_dremio::Dremio::new(
                config,
                auth,
                use_ssl.parse().unwrap(),
                host,
                port,
            )
            .await?;
            Ok(Box::new(database))

Outlines the variables you need: DREMIO_HOST, DREMIO_PORT, DREMIO_USE_SSL, DREMIO_USER, DREMIO_PASSWORD and they can be stored in .env file

DREMIO_HOST=localhost
DREMIO_PORT=32010
DREMIO_USE_SSL=false
DREMIO_USER=admin
DREMIO_PASSWORD=fht4jyx9HAY!jxk1ydg

Is what I got working locally for this "setup" // It should be running on the following ports // docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 -p 45678:45678 dremio/dremio-oss // 1. Create test space // 2. Create test folder inside the test space // 3. Create the samples source

With the config

dremio:
  dremio_space: test
  dremio_space_folder: test