jschaf / pggen

Generate type-safe Go for any Postgres query. If Postgres can run the query, pggen can generate code for it.
MIT License
282 stars 26 forks source link

Fix order by column exist in explain output #13

Closed ikozinov closed 3 years ago

ikozinov commented 3 years ago

When generating code for a PostgreSQL request like this:

SELECT col1 ORDER BY col2

after executing the code generator, panic occurs:

=== RUN   TestGenerate_Go_Example_Author
    /Users/ivk/work/pggen/example/author/codegen_test.go:13: created schema: pggen_test_653479730
--- FAIL: TestGenerate_Go_Example_Author (0.13s)
panic: runtime error: index out of range [2] with length 2 [recovered]
    panic: runtime error: index out of range [2] with length 2

goroutine 6 [running]:
testing.tRunner.func1.1(0x17edba0, 0xc000416ae0)
    /usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1076 +0x30d
testing.tRunner.func1(0xc000001c80)
    /usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1079 +0x41a
panic(0x17edba0, 0xc000416ae0)
    /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:969 +0x175
github.com/jschaf/pggen/internal/pginfer.(*Inferrer).inferOutputNullability(0xc00037a790, 0xc000382d90, 0xc000110180, 0x2, 0x2, 0x0, 0x0, 0x1, 0x1921e00, 0xc0003af1d8)
    /Users/ivk/work/pggen/internal/pginfer/pginfer.go:241 +0x4f7
github.com/jschaf/pggen/internal/pginfer.(*Inferrer).inferOutputTypes(0xc00037a790, 0xc000382d90, 0x0, 0x0, 0x0, 0x0, 0x0)
    /Users/ivk/work/pggen/internal/pginfer/pginfer.go:190 +0x3ee
github.com/jschaf/pggen/internal/pginfer.(*Inferrer).InferTypes(0xc00037a790, 0xc000382d90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/ivk/work/pggen/internal/pginfer/pginfer.go:81 +0x18a
github.com/jschaf/pggen.parseQueries(0x1838768, 0x9, 0xc00037a790, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x17708c0)
    /Users/ivk/work/pggen/generate.go:214 +0x513
github.com/jschaf/pggen.parseQueryFiles(0xc0004122f0, 0x1, 0x1, 0xc00037a790, 0xc00042a000, 0x65, 0xc0004122f0, 0x1, 0x1)
    /Users/ivk/work/pggen/generate.go:178 +0xc5
github.com/jschaf/pggen.Generate(0x1835da8, 0x2, 0xc00042a000, 0x65, 0xc0004122f0, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
    /Users/ivk/work/pggen/generate.go:103 +0x4d3
github.com/jschaf/pggen/example/author.TestGenerate_Go_Example_Author(0xc000001c80)
    /Users/ivk/work/pggen/example/author/codegen_test.go:17 +0x1fb
testing.tRunner(0xc000001c80, 0x1863cd0)
    /usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1127 +0xef
created by testing.(*T).Run
    /usr/local/Cellar/go/1.15.2/libexec/src/testing/testing.go:1178 +0x386
FAIL    github.com/jschaf/pggen/example/author  0.307s

After debug, I found out that for some reason in the EXPLAIN command output in the Output field there is an extra value with the name of the column that is mentioned in ORDER BY clause.

It turns out that the length of plan.Outputs is not always equal to the length of the array with column names (cols)

jschaf commented 3 years ago

Thanks for the PR and investigation! Nice job figuring out how to add the test case; I was worried it might be too complicated.

This bug is so strange. Running this prepared query using NULL for $1:

EXPLAIN (VERBOSE, FORMAT JSON) SELECT first_name, last_name FROM author ORDER BY author_id = $1;

Has the explain output of:

[
  {
    "Plan": {
      "Node Type": "Seq Scan",
      "Parallel Aware": false,
      "Relation Name": "author",
      "Schema": "pggen_test_240800122",
      "Alias": "author",
      "Startup Cost": 0.00,
      "Total Cost": 16.30,
      "Plan Rows": 630,
      "Plan Width": 65,
      "Output": ["first_name", "last_name", "NULL::boolean"]
    }
  }
]

I'll try to figure out a more elegant solution, but your PR is a nice quick fix.

jschaf commented 3 years ago

Ok, I think I figured it out. The plan output contains enough information to run the the entire query. In this case, the seq scan node propagates the author_id in order to use it for the sort. It's much clearer if we use a concrete value for the order by clause:

EXPLAIN (VERBOSE, FORMAT JSON)
SELECT first_name, last_name
FROM author
ORDER BY author_id = 222;
[
  {
    "Plan": {
      "Node Type": "Sort",
      "Parallel Aware": false,
      "Startup Cost": 47.17,
      "Total Cost": 48.74,
      "Plan Rows": 630,
      "Plan Width": 65,
      "Output": ["first_name", "last_name", "((author_id = 222))"],
      "Sort Key": ["((author.author_id = 222))"],
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Relation Name": "author",
          "Schema": "public",
          "Alias": "author",
          "Startup Cost": 0.00,
          "Total Cost": 17.88,
          "Plan Rows": 630,
          "Plan Width": 65,
          "Output": ["first_name", "last_name", "(author_id = 222)"]
        }
      ]
    }
  }
]

I'm not sure why the sort node doesn't appear when we run the explain query using null as a prepared value. Maybe the postgres planner infers that author_id can never be null due to the primary key constraint so it doesn't bother with a sort.