mmsmdali / dremio-clickhouse-arp-connector

Dremio ARP driven connector that supports ClickHouse Columnar Database
Apache License 2.0
12 stars 8 forks source link

Dremio ClickHouse ARP Connector

The ClickHouse connector allows Dremio to connect to and query data in ClickHouse Columnar Database.

Dremio ClickHouse ARP Connector can be used as a bridge between Dremio Data Lakehouse Platform & ClickHouse DB through ClickHouse JDBC.

ARP connector for some other databases are available @ Dremio Hub.

Building and Installation

  1. In root directory with the pom.xml file run mvn clean install
  2. Take the resulting .jar file in the target folder or copy the pre-build jar and put it in the $DREMIO_HOME/jars folder in Dremio
  3. Take the ClickHouse JDBC driver or copy the pre-build jar or mvnrepository and put in in the $DREMIO_HOME/jars/3rdparty folder
  4. Restart Dremio

ARP Overview

The Advanced Relational Pushdown (ARP) Framework allows for the creation of Dremio plugins for any data source which has a JDBC driver and accepts SQL as a query language. It allows for a mostly code-free creation of a plugin, allowing for modification of queries issued by Dremio using a configuration file.

There are two files that are necessary for creation of an ARP-based plugin: the storage plugin configuration, which is code, and the plugin ARP file, which is a YAML file.

The storage plugin configuration file tells Dremio what the name of the plugin should be, what connection options should be displayed in the source UI, what the name of the ARP file is, which JDBC driver to use and how to make a connection to the JDBC driver.

The ARP YAML file is what is used to modify the SQL queries that are sent to the JDBC driver, allowing you to specify support for different data types and functions, as well as rewrite them if tweaks need to be made for your specific data source.

ARP File Format

The ARP file is broken down into several sections:

metadata

syntax

data_types

relational_algebra - This section is divided up into a number of other subsections:

If an operation or function is not specified in the ARP file, then Dremio will handle the operation itself. Any operations which are indicated as supported but need to be stacked on operations which are not will not be pushed down to the SQL query.