ing-bank / scruid

Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
Apache License 2.0
115 stars 29 forks source link

Support Druid SQL queries #90

Closed anskarl closed 4 years ago

anskarl commented 4 years ago

Apache Druid supports two query languages, native JSON-based queries and SQL. At the moment native Druid API is more powerful compare to the SQL, perhaps this may change in the future and become equal. Internally, Druid uses a parser and planner based on Apache Calcite which translates to the native API.

So far Scruid performs queries on top the native API of Druid, either using the specialized case classes or the DQL. Additionally, Scruid uses Akka Http and Streams to provide a robust HTTP client for performing requests and handling of large payloads.

According to the documentation of Druid there are two client APIs for SQL, HTTP POST and JDBC. The non-JDBC JSON over HTTP API is stateless and does not require connection stickiness.

I think that SQL support it would be nice addition to Scruid and take advantage of the HTTP clients that it provides, as well as the bindings to case classes (via circe) regarding the results.

Druid SQL technical details

To perform SQL queries using HTTP, Druid provides the endpoint /druid/v2/sql/. An example query is given below:

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2000-01-01 00:00:00'",
  "context" : {
    "sqlTimeZone" : "America/Los_Angeles"
  }
}

Druid also supports parameterized queries:

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = ? AND __time > ?",
  "parameters": [
    { "type": "VARCHAR", "value": "bar"},
    { "type": "TIMESTAMP", "value": "2000-01-01 00:00:00" }
  ]
}

The structure of JSON object to perform SQL queries contains the query and optionally the context, parameters and resultFormat. The latter controls the format of the resulting response, for Scruid setting resultFormat to object is an option that can be directly supported in order to retain the case class mapping via circe.

Implementation features

In https://github.com/anskarl/scruid/tree/feature/sql there is a full implementation of SQL in Scruid and supports the following:

Examples

SQL queries are expressed using the sql processed string function. The examples below demonstrate the functionality:

import scala.concurrent.Future
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.DruidSQLResults

val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""

val response: Future[DruidSQLResults] = query.execute()

// The resulting response can also mapped to a case class
case class Result(count: Double)

val result: Future[Result] = response.map(_.list[Result])

Multi-line queries:

import ing.wbaa.druid.SQL._

val query = sql"""
    |SELECT SELECT COUNT(*) as "count" 
    |FROM wikipedia 
    |WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'
    """.stripMargin

String interpolation:

import ing.wbaa.druid.SQL._

val countColumnName = "count"
val dataSourceName = "wikipedia"
val dateTime = "2015-09-12 00:00:00"

val query = sql"""
    |SELECT SELECT COUNT(*) as "${countColumnName}" 
    |FROM ${dataSourceName} 
    |WHERE "__time" >= TIMESTAMP '${dateTime}'
    """.stripMargin

Parameterized Queries:

import java.time.LocalDateTime
import ing.wbaa.druid.SQLQuery

import ing.wbaa.druid.SQL._

val fromDateTime  = LocalDateTime.of(2015, 9, 12, 0, 0, 0, 0)
val untilDateTime = fromDateTime.plusDays(1)

val queryParameterized: SQLQuery.Parameterized =
  sql"""
  |SELECT FLOOR(__time to HOUR) AS hourTime, count(*) AS "count"
  |FROM wikipedia
  |WHERE "__time" BETWEEN ? AND ?
  |GROUP BY 1
  |""".stripMargin.parameterized

val query: SQLQuery = queryParameterized
  .withParameter(fromDateTime)
  .withParameter(untilDateTime)
  .create()

Context parameters:

import ing.wbaa.druid.SQL._
import ing.wbaa.druid.definitions.QueryContext

val contextParameters = Map(
  QueryContext.SqlQueryId -> "scruid-sql-example-query",
  QueryContext.SqlTimeZone -> "America/Los_Angeles"
)
val query =
  sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""
      .setContext(contextParameters)

Akka Streams:

import akka.NotUsed
import akka.stream.scaladsl.Source
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.{DruidSQLResult, DruidSQLResults}

val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""

val source: Source[DruidSQLResult, NotUsed] = query.stream()

// The resulting stream of DruidSQLResult can also mapped to a case class
case class Result(count: Double)

val sourceOfResults: Source[Result, NotUsed] = query.streamAs[Result]()