ing-bank / scruid

Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
Apache License 2.0
115 stars 29 forks source link

Table, Lookup, Union, Inline, Query and Join datasource types #104

Open anskarl opened 4 years ago

anskarl commented 4 years ago

In recent versions of Druid the datasource specification has been extended, in order to support Joins between datasources, Inline datasources, Queries as datasources, etc. Scruid at the moment supports only table datasources, which is the most common type (the one that you get when you perform data ingestion).

With some additions, Scruid can support the following:

Example scan query over inline data:

import ing.wbaa.druid._
import ing.wbaa.druid.definitions._
import ing.wbaa.druid.dql.DSL._

val countryData = Locale.getISOCountries.toList
  .map { code =>
    val locale = new Locale("en", code)
    List(code, locale.getISO3Country, locale.getDisplayCountry)
  }

 val query: ScanQuery = DQL
  .scan()
  .interval("0000/3000")
  .from(Inline(columnNames, countryData))
  .build()

Example inner join over inline data. Specifically the query below joins country ISO-2 code between table wikipedia and inline data of ISO-2 code, ISO-3 code and English name of country:

val query: ScanQuery = DQL
  .scan()
  .columns(
    "channel",
    "cityName",
    "countryIsoCode",
    "user",
    "mapped_country_iso3_code",
    "mapped_country_name")
  .granularity(GranularityType.All)
  .interval("0000/4000")
  .batchSize(10)
  .limit(numberOfResults)
  .from(
    Table("wikipedia")
      .join(
          right = Inline(Seq("iso2_code", "iso3_code", "name"), countryData),
          prefix = "mapped_country_",
          condition = d"countryIsoCode" === d"mapped_country_iso2_code"
     )
  )
  .build()

The expression d"countryIsoCode" === d"mapped_country_iso2_code" uses the same syntax with filtering and having clauses (e.g., .where(d"countryIsoCode" === d"mapped_country_iso2_code")), alternatively the expression can also written as:

expr"""countryIsoCode == mapped_country_iso2_code"""

A work in progress branch that contains functional Join, Inline and Table datasource types, as well as all the operators of the Druid expressions can be found in https://github.com/anskarl/scruid/tree/wip/datasource

Internal implementation details

All native query types in package ing.wbaa.druid extend the DruidNativeQuery trait, in which the dataSource field from String changes to Datasource type:

sealed trait DruidNativeQuery extends DruidQuery {

  val dataSource: Datasource

}

Trait Datasource is located in package ing.wbaa.druid.definitions:

sealed trait Datasource {
  val `type`: DatasourceType
}

The types Table, Lookup, Union, Inline, Query and Join are outlined in the enumeration DatasourceType. Each one of them is represented by a trait that extends the Datasource. For example, Union datasource type:

case class Union(dataSources: Iterable[String]) extends Datasource {
  override val `type`: DatasourceType = DatasourceType.Union
}

For Join operations, the left side of the operation support any of Table, Lookup, Union, Inline, Query and Join datasource types, while the right side of the operation supports only Lookup, Query and Inline types. For that reason Lookup, Query and Inline classes extend RightHandDatasource trait (which directly extends Datasource).

sealed trait RightHandDatasource extends Datasource

case class Inline(columnNames: Iterable[String], rows: Iterable[Iterable[String]])
    extends RightHandDatasource {
  override val `type`: DatasourceType = DatasourceType.Inline
}

Regarding DQL, the main additions are:

For Druid expressions that are syntactically common with Filtering and Aggregation expressions, there are BaseExpression and BaseArithmeticExpression traits in package ing.wbaa.druid.dql.expressions.

For example the BaseExpression for and expression, is represented as an AND logical expression filter when appears in a where clause, and as && (binary logical AND) expression inside a Join condition.