ing-bank / scruid

Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
Apache License 2.0
115 stars 29 forks source link

Query builders with SQL-like DSL #32

Closed anskarl closed 5 years ago

anskarl commented 6 years ago

I would like to propose a DSL for creating queries with SQL-like definition of filters.

I have implemented an initial version of builders in https://github.com/anskarl/scruid/blob/feature/dql/src/main/scala/ing/wbaa/druid/dql/DQL.scala and a DSL for defining filters https://github.com/anskarl/scruid/blob/feature/dql/src/main/scala/ing/wbaa/druid/dql/FilterDSL.scala

For example, a group-by query can be defined as below:

val query: GroupByQuery = DQL.Builder
  .from(dataSource = "some_source_name")
  .withGranularity(GranularityType.All)
  .addIntervals("2018-08-01/2018-08-10")
  .agg(CountAggregation(name = "count"))
  .where(('dim1 === "some_value" or 'dim1 === "some_another_value") and 'dim2.isNotNull)
  .groupBy('dim1, 'dim2, 'dim3)
  .build()

As you can see in the example, the filters over dimensions dim1 and dim2 in the where function are defined using a SQL-like representation (inspired by Apache Spark Dataframe API).

The current version of the DSL for filters supports various operators (e.g., and, or, not, notIn, isNull, isNotNul, etc) and the dimension names are represented by scala symbols (e.g., 'some_dim_name).

Below I am giving some additional example of where clauses:

  ...
  .where(('dim1 === "some_value") and ('dim2 in ("val1", "val2", "val3")) )
  ...
  ...
  .where('dim1 like "%something%" and 'dim2.isNotNull)
  ...

There are also functions that support conjuctions or disjunctions over multiple filters:

  ...
  .where(conjunction('dim1 === "val1", 'dim2 === "val2", 'dim3 === "val3" or 'dim3.isNull)
  ...
  ...
  .where(disjunction('dim1 === "val1", 'dim2 === "val2", 'dim3 === "val3", 'dim3.isNull)
  ...

If you believe that the proposed API is helpful, I would be happy to continue the implementation and thereafter create a PR.

Fokko commented 6 years ago

@anskarl I really like the idea. It allows us to write very elegant queries.

anskarl commented 6 years ago

@Fokko I am glad that you like the idea. I will continue the implementation and hopefully create a PR soon.

anskarl commented 5 years ago

Hello, so far I have implemented the following:

- flow-based builders for timeseries, group-by and top-n queries
- filter expressions
- having expressions
- aggregation expressions

Post-aggregation expressions is a work in progress

Examples of the current version can be found in https://github.com/anskarl/scruid/blob/master/src/test/scala/ing/wbaa/druid/dql/DQLSpec.scala

bjgbeelen commented 5 years ago

Fixed, so can be closed