zio / zio-quill

Compile-time Language Integrated Queries for Scala
https://zio.dev/zio-quill
Apache License 2.0
2.15k stars 346 forks source link

Feature Request: write compile-time queries to a file #1715

Closed lihaoyi-databricks closed 2 years ago

lihaoyi-databricks commented 4 years ago

Currently they're written to the console, which is spammy, as generally you do not need them and they're not changing. I would like to be able to inspect them on-demand though, without needing to fiddle with my build config to inject a compiler flag.

Ideally, we'd have them written to a file in some structured format (JSON?). That would allow me to inspect them when I'm interested, ignore them when I don't, and also write tools to work with the query set: e.g. to validate the queries against my DDL and ensure no query plans result in table scans, or to find any totally-unused indices so they can be removed

deusaquilus commented 2 years ago

So a couple of thoughts: 1) Ideally I would not like to add an external dependency on a JSON library for something this simple. Maybe in this case the JSON could be hand-written because it's basically the same pattern over and over again.

    [
      {
        "query": "SELECT p.name, p.age FROM p",
        "line": 123,
        "position": "23:24",
       "producedAt": "20210304.16:34"
      },
      {
        "query": "SELECT p.name, p.age FROM p FILTER p.name == 'JOE'",
        "line": 124,
        "position": "12:13",
        "file": "/my/path/to/file.scala",
        "producedAt": "20210304.16:34"
      }
    ]

2) For longer queries, JSON is really not ideal. Just imagine something that looks like this:

    [
        "query": "SELECT g._1, g._2idG, g._2paridP, g._2paremb1a, g._2paremb1b FROM (SELECT DISTINCT 3 AS _1, g.idG AS _2idG, g.paridP AS _2paridP, g.paremb1a AS _2paremb1a, g.paremb1b AS _2paremb1b FROM (SELECT DISTINCT 2 AS idG, p.idP AS paridP, p.emb1a AS paremb1a, p.emb1b AS paremb1b FROM (SELECT DISTINCT 1 AS idP, e.a AS emb1a, e.b AS emb1b FROM Emb e) AS p) AS g) AS g",
        "line": 124,
        "position": "12:13"  
    ]
That's not very useful in my opinion. It needs to be a format that supports multi line strings (XML, YAML, Hocon, etc..). 
I personally think XML is best for this:
```xml
<query>
     <string>
      "SELECT
        g._1,
        g._2idG,
        g._2paridP,
        g._2paremb1a,
        g._2paremb1b
      FROM
        (
          SELECT
            DISTINCT 3 AS _1,
            g.idG AS _2idG,
            g.paridP AS _2paridP,
            g.paremb1a AS _2paremb1a,
            g.paremb1b AS _2paremb1b
          FROM
            (
              SELECT
                DISTINCT 2 AS idG,
                p.idP AS paridP,
                p.emb1a AS paremb1a,
                p.emb1b AS paremb1b
              FROM
                (
                  SELECT
                    DISTINCT 1 AS idP,
                    e.a AS emb1a,
                    e.b AS emb1b
                  FROM
                    Emb e
                ) AS p
            ) AS g
        ) AS g
     </string>
     <file position="23:24", line=123>/my/path/to/file.scala</file>
</query>
```
YAML is Okay:
```yaml
 - query:
     string:
      "SELECT
        g._1,
        g._2idG,
        g._2paridP,
        g._2paremb1a,
        g._2paremb1b
      FROM
        (
          SELECT
            DISTINCT 3 AS _1,
            g.idG AS _2idG,
            g.paridP AS _2paridP,
            g.paremb1a AS _2paremb1a,
            g.paremb1b AS _2paremb1b
          FROM
            (
              SELECT
                DISTINCT 2 AS idG,
                p.idP AS paridP,
                p.emb1a AS paremb1a,
                p.emb1b AS paremb1b
              FROM
                (
                  SELECT
                    DISTINCT 1 AS idP,
                    e.a AS emb1a,
                    e.b AS emb1b
                  FROM
                    Emb e
                ) AS p
            ) AS g
        ) AS g"
     position: "23:24"
     line: 123
     file: "/my/path/to/file.scala"
```

We could also try Hocon or something else. 3) We already have a query-formatter that we can use so that part is taken care of.

deusaquilus commented 2 years ago

More thoughts: 4) One way to do this would be to open a global file handle in MacroContextExt and just write to the file as calls to MacroContextExt.query come in. Unfortunately, this is very problematic because we have to deal with all sorts of low-level IO stuff: 1) What happens if the file handle goes away? 2) How do we know when to close the file? How to know when compilation has ended? 3) We want to do the file writes asychronously (ZStream?) in order to not slow down compilation with IO but then we have to deal with all the recovery use-cases.

This is why I think we should treat this as a logging problem. I.e. we need to have some kind of custom log-appender that writes entries in the log formats mentioned above into a file and let a logging framework take care of all the other thorny details. @adamgfraser Could we use zio-logging in order to do something like this?

adamgfraser commented 2 years ago

@deusaquilus Yes this seems like something that is perfect for logging and I must admit I have had the same thought as @lihaoyi-databricks regarding seeing the queries every time I compile.

guizmaii commented 2 years ago

I really like the idea. I see many advantages.

As Quill users, we could commit this file to Git which would allow us to track changes in the queries more easily. That would also allow us to review queries in PRs.

deusaquilus commented 2 years ago

I think we should go with XML. Anyone religiously opposed to using XML for this feature speak now or forever hold your peace!

reibitto commented 2 years ago

I was thinking outputting to an .sql file. Any metadata could be comments. There are downsides to this approach too, but there are no escaping or formatting issues. And you can run the outputted queries easily in many IDEs. For example, IntelliJ:

image

Inspired a bit from yesql (which also supports putting multiple queries in one file)

deusaquilus commented 2 years ago

I like this! Let's do it this way.

deusaquilus commented 2 years ago

Instructions to Solve

Okay, this is going to go into the ZIO hackathon so here's how to solve it (thanks @adamgfraser!). 1) Add zio-logging to the dependencies of quill-core. I.e. "dev.zio" %% "zio-logging" % "0.5.13". 2) Add the following to MacroContextExt.scala

  val env = {
    ZEnv.live >>>
      Logging.file(
        logLevel = LogLevel.Info,
        format = LogFormat.fromFunction((ctx, str) => {
          str
        }),
        destination = Paths.get("queries.sql")
      ) >>> Logging.withRootLoggerName("query-logger")
  }

  val runtime = Runtime.unsafeFromLayer(env)

Then in the query method, add the following:

runtime.unsafeRunAsync_(log.info(
        s"""
         |-- file: ${c.enclosingPosition.source.path}
         |-- line: ${c.enclosingPosition.line}
         |-- col: ${c.enclosingPosition.column}
         |-- time: ${ZonedDateTime.now().format(formatter)}
         |${idiom.format(queryString)}
         |""".stripMargin
      ))

This will do the actual logging when a query is encountered.

4) Also, it would be nice to be able to configure the file being used to log. In order to do that, add the following into Messages.scala:

private[getquill] def quillLogFile = cache("quill.log.file", LogToFile(variable("quill.log.file", "quill_log_file", LogToFile.Enabled("queries.txt"))))

  sealed trait LogToFile
  object LogToFile {
    case class Enabled(file: String) extends LogToFile
    case object Disabled
    def apply(switch: String) =
      switch.trim match {
        case "false" => Disabled
        case other => Enabled(other)
      }
  }

This setting is global so you can get it like this: Messages.quillLogFile.

5) Modify MacroContextExt to use this switch (i.e. log to the named file if enabled, otherwise don't log). 6) Write a simple unit test for it that just logs something to a file and checks that it's written. 7) You're done! Make sure it builds and create a PR!

tdrozdowski commented 2 years ago

I'll take a crack at this one.