Apache Druid supports two query languages, native JSON-based queries and SQL. At the moment native Druid API is more powerful compare to the SQL, perhaps this may change in the future and become equal. Internally, Druid uses a parser and planner based on Apache Calcite which translates to the native API.
So far Scruid performs queries on top the native API of Druid, either using the specialized case classes or the DQL. Additionally, Scruid uses Akka Http and Streams to provide a robust HTTP client for performing requests and handling of large payloads.
According to the documentation of Druid there are two client APIs for SQL, HTTP POST and JDBC. The non-JDBC JSON over HTTP API is stateless and does not require connection stickiness.
I think that SQL support it would be nice addition to Scruid and take advantage of the HTTP clients that it provides, as well as the bindings to case classes (via circe) regarding the results.
Druid SQL technical details
To perform SQL queries using HTTP, Druid provides the endpoint /druid/v2/sql/. An example query is given below:
{
"query" : "SELECT COUNT(*) FROM data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2000-01-01 00:00:00'",
"context" : {
"sqlTimeZone" : "America/Los_Angeles"
}
}
The structure of JSON object to perform SQL queries contains the query and optionally the context, parameters and resultFormat. The latter controls the format of the resulting response, for Scruid setting resultFormat to object is an option that can be directly supported in order to retain the case class mapping via circe.
SQL queries are expressed using the sql processed string function. The examples below demonstrate the functionality:
import scala.concurrent.Future
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.DruidSQLResults
val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""
val response: Future[DruidSQLResults] = query.execute()
// The resulting response can also mapped to a case class
case class Result(count: Double)
val result: Future[Result] = response.map(_.list[Result])
Multi-line queries:
import ing.wbaa.druid.SQL._
val query = sql"""
|SELECT SELECT COUNT(*) as "count"
|FROM wikipedia
|WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'
""".stripMargin
String interpolation:
import ing.wbaa.druid.SQL._
val countColumnName = "count"
val dataSourceName = "wikipedia"
val dateTime = "2015-09-12 00:00:00"
val query = sql"""
|SELECT SELECT COUNT(*) as "${countColumnName}"
|FROM ${dataSourceName}
|WHERE "__time" >= TIMESTAMP '${dateTime}'
""".stripMargin
Parameterized Queries:
import java.time.LocalDateTime
import ing.wbaa.druid.SQLQuery
import ing.wbaa.druid.SQL._
val fromDateTime = LocalDateTime.of(2015, 9, 12, 0, 0, 0, 0)
val untilDateTime = fromDateTime.plusDays(1)
val queryParameterized: SQLQuery.Parameterized =
sql"""
|SELECT FLOOR(__time to HOUR) AS hourTime, count(*) AS "count"
|FROM wikipedia
|WHERE "__time" BETWEEN ? AND ?
|GROUP BY 1
|""".stripMargin.parameterized
val query: SQLQuery = queryParameterized
.withParameter(fromDateTime)
.withParameter(untilDateTime)
.create()
Context parameters:
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.definitions.QueryContext
val contextParameters = Map(
QueryContext.SqlQueryId -> "scruid-sql-example-query",
QueryContext.SqlTimeZone -> "America/Los_Angeles"
)
val query =
sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""
.setContext(contextParameters)
Akka Streams:
import akka.NotUsed
import akka.stream.scaladsl.Source
import ing.wbaa.druid.SQL._
import ing.wbaa.druid.{DruidSQLResult, DruidSQLResults}
val query = sql"""SELECT SELECT COUNT(*) as "count" FROM wikipedia WHERE "__time" >= TIMESTAMP '2015-09-12 00:00:00'"""
val source: Source[DruidSQLResult, NotUsed] = query.stream()
// The resulting stream of DruidSQLResult can also mapped to a case class
case class Result(count: Double)
val sourceOfResults: Source[Result, NotUsed] = query.streamAs[Result]()
Apache Druid supports two query languages, native JSON-based queries and SQL. At the moment native Druid API is more powerful compare to the SQL, perhaps this may change in the future and become equal. Internally, Druid uses a parser and planner based on Apache Calcite which translates to the native API.
So far Scruid performs queries on top the native API of Druid, either using the specialized case classes or the DQL. Additionally, Scruid uses Akka Http and Streams to provide a robust HTTP client for performing requests and handling of large payloads.
According to the documentation of Druid there are two client APIs for SQL, HTTP POST and JDBC. The non-JDBC JSON over HTTP API is stateless and does not require connection stickiness.
I think that SQL support it would be nice addition to Scruid and take advantage of the HTTP clients that it provides, as well as the bindings to case classes (via circe) regarding the results.
Druid SQL technical details
To perform SQL queries using HTTP, Druid provides the endpoint
/druid/v2/sql/
. An example query is given below:Druid also supports parameterized queries:
The structure of JSON object to perform SQL queries contains the
query
and optionally thecontext
,parameters
andresultFormat
. The latter controls the format of the resulting response, for Scruid settingresultFormat
toobject
is an option that can be directly supported in order to retain the case class mapping via circe.Implementation features
In https://github.com/anskarl/scruid/tree/feature/sql there is a full implementation of SQL in Scruid and supports the following:
Examples
SQL queries are expressed using the
sql
processed string function. The examples below demonstrate the functionality:Multi-line queries:
String interpolation:
Parameterized Queries:
Context parameters:
Akka Streams: