ing-bank / scruid

Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
Apache License 2.0
115 stars 29 forks source link

Adds support for Druid SQL #91

Closed anskarl closed 4 years ago

anskarl commented 4 years ago

GH-90

anskarl commented 4 years ago

Hi @krisgeus, thank you for the review. I have removed the double SELECT term from SQL examples :) and the commented out code from DruidQuery.scala.

krisgeus commented 4 years ago

@anskarl 1 left in the readme still ;-). I will wait for another pair of eyes to have a look at the PR before merging

anskarl commented 4 years ago

@anskarl 1 left in the readme still ;-). I will wait for another pair of eyes to have a look at the PR before merging

oops! ok its fixed

barend commented 4 years ago

One last thought: the Doobie library also offers sql"..." strings. Should we pick a different prefix to avoid the name-clash, or do we just trust people to avoid the issue with careful importing?

barend commented 4 years ago

If you look in the build output for the "code coverage report (2.12)" step, there are an awful lot of these loggings:

[ERROR] [05/06/2020 08:52:23.834] [scruid-actor-system-akka.actor.default-dispatcher-2] [akka.stream.Log(akka://scruid-actor-system/system/StreamSupervisor-39)] [scruid-load-balancer] Upstream failed. (akka.stream.AbruptTerminationException: Processor actor [Actor[akka://scruid-actor-system/system/StreamSupervisor-39/flow-1-2-mapAsyncUnordered#-1741531560]] terminated abruptly)

is that just the test suite not shutting things down properly or is there more going on?

anskarl commented 4 years ago

One last thought: the Doobie library also offers sql"..." strings. Should we pick a different prefix to avoid the name-clash, or do we just trust people to avoid the issue with careful importing?

How about dsql?

If you look in the build output for the "code coverage report (2.12)" step, there are an awful lot of these loggings:

[ERROR] [05/06/2020 08:52:23.834] [scruid-actor-system-akka.actor.default-dispatcher-2] [akka.stream.Log(akka://scruid-actor-system/system/StreamSupervisor-39)] [scruid-load-balancer] Upstream failed. (akka.stream.AbruptTerminationException: Processor actor [Actor[akka://scruid-actor-system/system/StreamSupervisor-39/flow-1-2-mapAsyncUnordered#-1741531560]] terminated abruptly)

is that just the test suite not shutting things down properly or is there more going on?

These errors are not related with the additions of this PR. The errors occur in DruidAdvancedHttpClientSpec, perhaps we should create another issue.

krisgeus commented 4 years ago

How about dsql?

I like it

anskarl commented 4 years ago

I have created a custom interpolator that can only accept the following types:

Scala type Druid SQL type
Char CHAR
String VARCHAR
Byte TINYINT
Short SMALLINT
Int INTEGER
Long BIGINT
Float FLOAT
Double DOUBLE
Boolean BOOLEAN
java.time.LocalDate DATE
java.time.LocalDateTime TIMESTAMP
java.sql.Timestamp TIMESTAMP

Any variable or expression injected into a query gets turned into a parameter. It is not inserted directly into the query string and therefore there is no danger of SQL injection attacks.

For example the query below:

import java.time.LocalDateTime
import ing.wbaa.druid.SQLQuery

import ing.wbaa.druid.SQL._

val fromDateTime: LocalDateTime = LocalDateTime.of(2015, 9, 12, 0, 0, 0, 0)
val untilDateTime: LocalDateTime = fromDateTime.plusDays(1)
val countryIsoCode: String = "US"

val query: SQLQuery =
  dsql"""
  |SELECT FLOOR(__time to HOUR) AS hourTime, count(*) AS "count"
  |FROM wikipedia
  |WHERE "__time" BETWEEN ${fromDateTime} AND ${untilDateTime} AND countryIsoCode = ${countryIsoCode}
  |GROUP BY 1
  |""".stripMargin

Will create the following JSON:

{
  "query" : "\nSELECT FLOOR(__time to HOUR) AS hourTime, count(*) AS \"count\"\nFROM wikipedia\nWHERE \"__time\" BETWEEN ? AND ? AND countryIsoCode = ?\nGROUP BY 1\n",
  "context" : {

  },
  "parameters" : [
    {
      "type" : "TIMESTAMP",
      "value" : "2015-09-12 00:00:00"
    },
    {
      "type" : "TIMESTAMP",
      "value" : "2015-09-13 00:00:00"
    },
    {
      "type" : "VARCHAR",
      "value" : "US"
    }
  ],
  "resultFormat" : "object"
}

Each interpolated variable has been replaced by the symbol ? and added to the list of parameters with its corresponding type and format. For instance, the variables fromDateTime and untilDateTime are instances of LocalDateTime and appear as types of SQL TIMESTAMP with values in formatted as y-MM-dd HH:mm:ss.

With that addition there is no need for a separate explicit process to create a parameterized query.

barend commented 4 years ago

I have created a custom interpolator that can only accept the following types:

@anskarl this is excellent! It makes the API simpler by removing the Parameterized type and it makes it the default to do the right thing.