Closed lihaoyi-databricks closed 3 years ago
So a couple of thoughts: 1) Ideally I would not like to add an external dependency on a JSON library for something this simple. Maybe in this case the JSON could be hand-written because it's basically the same pattern over and over again.
[
{
"query": "SELECT p.name, p.age FROM p",
"line": 123,
"position": "23:24",
"producedAt": "20210304.16:34"
},
{
"query": "SELECT p.name, p.age FROM p FILTER p.name == 'JOE'",
"line": 124,
"position": "12:13",
"file": "/my/path/to/file.scala",
"producedAt": "20210304.16:34"
}
]
2) For longer queries, JSON is really not ideal. Just imagine something that looks like this:
[
"query": "SELECT g._1, g._2idG, g._2paridP, g._2paremb1a, g._2paremb1b FROM (SELECT DISTINCT 3 AS _1, g.idG AS _2idG, g.paridP AS _2paridP, g.paremb1a AS _2paremb1a, g.paremb1b AS _2paremb1b FROM (SELECT DISTINCT 2 AS idG, p.idP AS paridP, p.emb1a AS paremb1a, p.emb1b AS paremb1b FROM (SELECT DISTINCT 1 AS idP, e.a AS emb1a, e.b AS emb1b FROM Emb e) AS p) AS g) AS g",
"line": 124,
"position": "12:13"
]
That's not very useful in my opinion. It needs to be a format that supports multi line strings (XML, YAML, Hocon, etc..).
I personally think XML is best for this:
```xml
<query>
<string>
"SELECT
g._1,
g._2idG,
g._2paridP,
g._2paremb1a,
g._2paremb1b
FROM
(
SELECT
DISTINCT 3 AS _1,
g.idG AS _2idG,
g.paridP AS _2paridP,
g.paremb1a AS _2paremb1a,
g.paremb1b AS _2paremb1b
FROM
(
SELECT
DISTINCT 2 AS idG,
p.idP AS paridP,
p.emb1a AS paremb1a,
p.emb1b AS paremb1b
FROM
(
SELECT
DISTINCT 1 AS idP,
e.a AS emb1a,
e.b AS emb1b
FROM
Emb e
) AS p
) AS g
) AS g
</string>
<file position="23:24", line=123>/my/path/to/file.scala</file>
</query>
```
YAML is Okay:
```yaml
- query:
string:
"SELECT
g._1,
g._2idG,
g._2paridP,
g._2paremb1a,
g._2paremb1b
FROM
(
SELECT
DISTINCT 3 AS _1,
g.idG AS _2idG,
g.paridP AS _2paridP,
g.paremb1a AS _2paremb1a,
g.paremb1b AS _2paremb1b
FROM
(
SELECT
DISTINCT 2 AS idG,
p.idP AS paridP,
p.emb1a AS paremb1a,
p.emb1b AS paremb1b
FROM
(
SELECT
DISTINCT 1 AS idP,
e.a AS emb1a,
e.b AS emb1b
FROM
Emb e
) AS p
) AS g
) AS g"
position: "23:24"
line: 123
file: "/my/path/to/file.scala"
```
We could also try Hocon or something else. 3) We already have a query-formatter that we can use so that part is taken care of.
More thoughts:
4) One way to do this would be to open a global file handle in MacroContextExt and just write to the file as calls to MacroContextExt.query
come in. Unfortunately, this is very problematic because we have to deal with all sorts of low-level IO stuff:
1) What happens if the file handle goes away?
2) How do we know when to close the file? How to know when compilation has ended?
3) We want to do the file writes asychronously (ZStream?) in order to not slow down compilation with IO but then we have to deal with all the recovery use-cases.
This is why I think we should treat this as a logging problem. I.e. we need to have some kind of custom log-appender that writes entries in the log formats mentioned above into a file and let a logging framework take care of all the other thorny details. @adamgfraser Could we use zio-logging in order to do something like this?
@deusaquilus Yes this seems like something that is perfect for logging and I must admit I have had the same thought as @lihaoyi-databricks regarding seeing the queries every time I compile.
I really like the idea. I see many advantages.
As Quill users, we could commit this file to Git which would allow us to track changes in the queries more easily. That would also allow us to review queries in PRs.
I think we should go with XML. Anyone religiously opposed to using XML for this feature speak now or forever hold your peace!
I was thinking outputting to an .sql file. Any metadata could be comments. There are downsides to this approach too, but there are no escaping or formatting issues. And you can run the outputted queries easily in many IDEs. For example, IntelliJ:
Inspired a bit from yesql (which also supports putting multiple queries in one file)
I like this! Let's do it this way.
Okay, this is going to go into the ZIO hackathon so here's how to solve it (thanks @adamgfraser!).
1) Add zio-logging to the dependencies of quill-core
. I.e. "dev.zio" %% "zio-logging" % "0.5.13"
.
2) Add the following to MacroContextExt.scala
val env = {
ZEnv.live >>>
Logging.file(
logLevel = LogLevel.Info,
format = LogFormat.fromFunction((ctx, str) => {
str
}),
destination = Paths.get("queries.sql")
) >>> Logging.withRootLoggerName("query-logger")
}
val runtime = Runtime.unsafeFromLayer(env)
Then in the query
method, add the following:
runtime.unsafeRunAsync_(log.info(
s"""
|-- file: ${c.enclosingPosition.source.path}
|-- line: ${c.enclosingPosition.line}
|-- col: ${c.enclosingPosition.column}
|-- time: ${ZonedDateTime.now().format(formatter)}
|${idiom.format(queryString)}
|""".stripMargin
))
This will do the actual logging when a query is encountered.
4) Also, it would be nice to be able to configure the file being used to log. In order to do that, add the following into Messages.scala:
private[getquill] def quillLogFile = cache("quill.log.file", LogToFile(variable("quill.log.file", "quill_log_file", LogToFile.Enabled("queries.txt"))))
sealed trait LogToFile
object LogToFile {
case class Enabled(file: String) extends LogToFile
case object Disabled
def apply(switch: String) =
switch.trim match {
case "false" => Disabled
case other => Enabled(other)
}
}
This setting is global so you can get it like this: Messages.quillLogFile
.
5) Modify MacroContextExt
to use this switch (i.e. log to the named file if enabled, otherwise don't log).
6) Write a simple unit test for it that just logs something to a file and checks that it's written.
7) You're done! Make sure it builds and create a PR!
I'll take a crack at this one.
Currently they're written to the console, which is spammy, as generally you do not need them and they're not changing. I would like to be able to inspect them on-demand though, without needing to fiddle with my build config to inject a compiler flag.
Ideally, we'd have them written to a file in some structured format (JSON?). That would allow me to inspect them when I'm interested, ignore them when I don't, and also write tools to work with the query set: e.g. to validate the queries against my DDL and ensure no query plans result in table scans, or to find any totally-unused indices so they can be removed