clickhouse-datasource-bridge

An alternative implementation of JDBC Bridge for ClickHouse.

Features

Named Query

Besides data source, you can define named queries in configuration as well.

Named data source - config/datasources/ch.json

{
    "ch": {
        "type": "jdbc",
        "jdbcUrl": "jdbc:clickhouse://localhost/system",
        "dataSource": {
            "user": "default",
            "password": "",
        },
        "columns": [
            { "name": "instance_id", "type": "Int32", "value": "0", "nullable": false }
        ],
        "parameters": {
            "max_rows": 1000,
            "fetch_size": 200
        }
    }
}

Note: type, columns and parameters are optional.

Named query - config/queries/test-query.json

{
    "test-query": {
        "query": "select * from test_table",
        "columns": [
            {
                "name": "column1",
                "type": "UInt32",
                "nullable": false
            },
            {
                "name": "column2",
                "type": "String",
                "nullable": true
            },
            {
                "name": "column3",
                "type": "Decimal",
                "nullable": true,
                "precision": 10,
                "scale": 2
            }
        ],
        "parameters": {
            "max_rows": 10
        }
    }
}

Note: like named data source, paramters is not mandatory. columns is optional too but it's highly recommended, as it prevents runtime type inferring which could be slow.

Saved query - scripts/tests/test-query.sql

select * from test_table

With above configuration setup, you should be able to run the following queries in ClickHouse:

-- adhoc query
select * from jdbc('ch', 'select * from test_table');

-- named query
select * from jdbc('ch', 'test-query');

-- saved query
select * from jdbc('ch', 'scripts/tests/test-query.sql');

Query Parameter

You can put query parameters like max_rows either in config or in your query.

-- get the first 10 rows(based on above configuration)
select * from jdbc('ch', 'test-query')
-- replace all null values
select * from jdbc('ch?null_as_default=true', 'test-query')
-- get a specific row
select * from jdbc('ch?max_rows=3&offset=2', 'select * from test_table order by column1 desc')
-- retrieve meta data for defining a named query
select * from jdbc('ch?debug=true', 'select * from test_table limit 1')

SRV Record Support

If you're using Consul or any other DNS server with SRV record support, you probably want to use service name instead of hostname/IP and port number combination when defining a datasource. To do that, assuming mysql.service.dc1.consul is the service name pointing to 127.0.0.1:3306, you can use any of below format instead of jdbc:mysql://127.0.0.1:3306/test:
- jdbc:mysql://{{ mysql.service.dc1.consul }}/test
- jdbc:mysql://{{ host:mysql.service.dc1.consul }}/test
- jdbc:mysql://{{ host:mysql.service.dc1.consul }}:{{ port:mysql.service.dc1.consul }}/test

Multiple Types of Data Sources

In addition to JDBC, clickhouse-datasource-bridge is extensible to support arbitrary data sources.

-- run named query in pre-defined database
select * from jdbc('ch', 'test-query')
-- get list of jobs from Jenkins view
select * from jdbc('jenkins:https://builds.apache.org/', 'jobs')

Based on Vert.x

Eclipse Vert.x is event driven and non blocking. It also makes clickhouse-datasource-bridge easy to config and scale.

Quick Start

# run with default configuration and JDBC drivers
docker run --rm -it -p 9019:9019 zhicwu/clickhouse-datasource-bridge

# run with custom configuration and JDBC drivers
docker run --rm -v `pwd`/config:/app/config -v `pwd`/drivers:/app/drivers -it -p 9019:9019 zhicwu/clickhouse-datasource-bridge

Usage

ClickHouse client version 19.11.8.46 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.11.8 revision 54423.

ch-server :) select * from jdbc('ch', 'select 1')

SELECT *
FROM jdbc('ch', 'select 1')

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.120 sec. 

ch-server :) select * from jdbc('jdbc:ch', 'select 1')

SELECT *
FROM jdbc('jdbc:ch', 'select 1')

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.074 sec. 

ch-server :) select * from jdbc('jdbc://ch', 'select 1')

SELECT *
FROM jdbc('jdbc://ch', 'select 1')

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.084 sec. 

ch-server :) select * from jdbc('jdbc://ch?debug=true', 'select 1')

SELECT *
FROM jdbc('jdbc://ch?debug=true', 'select 1')

┌─datasource─┬─type─┬─query────┬─parameters─────────────────────────────────────┐
│ ch         │ jdbc │ select 1 │ fetch_size=1000&max_rows=0&offset=0&position=0 │
└────────────┴──────┴──────────┴────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.052 sec. 

ch-server :) select * from jdbc('jdbc:clickhouse://localhost/system?user=default&password=', 'select 1')

SELECT *
FROM jdbc('jdbc:clickhouse://localhost/system?user=default&password=', 'select 1')

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.156 sec.

Configuration

Below is the directory structure used in docker image. By default, all configuration files must be put under config directory and JDBC drivers under drivers.

/app
  |
  |--- config
  |     |
  |     |--- datasources
  |     |     |
  |     |     |--- test-mysql.json
  |     |
  |     |--- queries
  |     |     |
  |     |     |--- test-query.json
  |     |
  |     |--- httpd.json
  |     |--- vertx.json
  |     |--- server.json
  |
  |--- drivers
  |     |
  |     |--- some-shaded-jdbc-driver.jar
  |
  |--- clickhouse-datasource-bridge.jar

Usually you don't need any of them but you can surely customize as needed.

File	Reloadable (Y/N)	Description
vertx.json	N	Vertx configuration, check here for more.
server.json	N	Server configuration.
httpd.json	N	Http server configuration, check here for more.
datasources/*.json	Y	Named data sources.
queries/*.json	Y	Named queries.

vertx.json

This configuration file will be only read during server starting and it is NOT reloadable afterwards.
```
{
    "maxWorkerExecuteTime": 300,
    "maxWorkerExecuteTimeUnit": "SECONDS",
    "workerPoolSize": 10
}
```
server.json

This configuration file will be only read during server starting and it is NOT reloadable afterwards.
```
{
    "serverPort": 8080,
    "requestTimeout": 5000,
    "queryTimeout": 60000
}
```
datasources/named-data-source.json
datasources/named-query.json

Known Issues

Issue	Workaround	Remark
query timed out	server-side: 1) increase timeout in datasource configuration; 2) increase `max_execution_timeout` in `server.json`; client-side: 1) increase clickhouse-jdbc-driver timeout

TODOs

[ ] Consul support - retrieve datasource and/or query from Consul KV store...
[ ] Error handling - abort worker thread once client closed stream(exception from stream.write?)
[ ] Special datasource like config - 'select * from jdbc('config', '', 'show datasources')'
[ ] Special run-time parameter to save query and/or generate datasources(select * from jdbc('ds?save_query=xxx', '', 'select 1'))
[ ] Strict mode(turn on by default)
[ ] Provider for CodeQL
[ ] More examples (docker-compose.yml, swarm and k8s, with/without Vert.x clustering)
[ ] Reduce dependencies so that we can go Native - https://vertx.io/blog/eclipse-vert-x-goes-native/
- Exclude unnecessary dependencies from Vertx lib
- Simple LRU cache(better in JDK) to replace Caffeine
- Avoid reflection - less configuration required to build native image
[ ] Reduce memory usage if we stick with JVM - https://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used
[ ] Parallelized query
[ ] More data sources(Cassandra, DGraph, InfluxDB and Prometheus)...

zhicwu / clickhouse-datasource-bridge

readme