The code output by `pggen` is dependent on the order of the columns in a table in `schema.sql`, though this order doesn't matter according to spec

jschaf / pggen

Generate type-safe Go for any Postgres query. If Postgres can run the query, pggen can generate code for it.

MIT License

282 stars 26 forks source link

The code output by `pggen` is dependent on the order of the columns in a table in `schema.sql`, though this order doesn't matter according to spec #37

Closed djsavvy closed 3 years ago

djsavvy commented 3 years ago

Thanks for building pggen -- it's pretty remarkable.

I've been running into an issue where the order of columns listed in schema.sql changes sometimes (though the columns themselves don't change). According to the SQL spec, reorderings of the columns in a table are no-ops, but the code output by pggen is substantively different since the fields in a given struct have reordered. The resulting symptom is an error due to type mismatch when trying to scan data into a struct.

Am I just using this library wrong? Is there a workaround for this issue?

jschaf commented 3 years ago

To make sure I understand, you'd like to be able to change the order of columns in a CREATE TABLE statement in schema.sql but have pggen sort columns in a struct by a different order?

pggen uses the column order of a query, not the underlying table, so I'm not sure I understand what's causing the change.

As an interesting side note: the SQL spec might not say anything about the order of columns, but the Postgres column order does affect the size of each tuple, similar to Go struct alignment. https://stackoverflow.com/questions/12604744/does-the-order-of-columns-in-a-postgres-table-impact-performance

jschaf commented 3 years ago

I'm not sure how to reproduce. If you have more info and preferably a reproducible example, please reopen.

djsavvy commented 3 years ago

Thank you!

mvrhov commented 2 years ago

I'm reading this ticket and have noticed something that's similar to bothering me. It might be the same thing that was a problem of @djsavvy . The pggen doesn't expand the * in SELECT * FROM x or INSERT|UPDATE x RETURNING *. Nothing guarantees that the * would always return columns in the same order. One simple postgresql upgrade could change that.

Expanding * would guarantee two things, the table can be altered anytime by adding new nullable columns and old code would continue to work. The order of columns in table can change (e.g the way @djsavvy said it happen in their case), and because we specify the return order of column we could still scant into the same struct.

jschaf commented 2 years ago

Expanding * would guarantee two thing. [...] One simple postgresql upgrade could change that.

The pggen code will fail sooner than that. The first migration that adds or removes a column that appears in SELECT * will cause the generated code to fail since it's expecting a different number of columns.

Another potential issue is that under the hood, pggen uses the column order of the table that pggen was run against (typically a dev db). After a bunch of migrations, the dev db and prod db column orders likely diverge, especially if you collapse migrations in a dev db. There's not a great way to handle SELECT * other than to avoid using it at the top level of a query. Using it in a nested query is okay.

I'm not clear if there's anything that can be done by pggen here. pggen is explicitly minimal in how it transforms SQL. If you want to use SELECT *, I think pggen should let you. I think this is better fixed in code review by requiring an explicit list of columns. IntelliJ even has a nice intention to expand it for you.

mvrhov commented 2 years ago

IMHO, the tool itself should do it, because without that the generated code fails. Alas not everybody uses/pays for InteliJ tools. Another reason why the pggen should do it is that one could easily miss something like that in the review when there is a lot of columns, expanding it sql files would add more "visual noise", having almost everything listed in e.g INSERT/UPDATE section twice and then the 3rd time in RETURNING section can be really overwhelming.

jschaf commented 2 years ago

A more practical concern is how would pggen implement this feature. As I understand it, the proposal is to replace SELECT * with SELECT col1, col2, ... and RETURNING * with RETURNING col1, col2. I think to implement this, I'd need:

A proper SQL parser to get an AST.
Rewrite the AST to replace the * with the columns.
Format the AST back into text which likely won't match the source format.

A regex won't work because it won't handle things like this admittedly complex example. In the below example the only * we would need to expand is the the last RETURNING clause.

WITH
  cte1 AS (SELECT * FROM foo),
  cte2 AS (INSERT INTO qux VALUES (...) RETURNING *)
INSERT INTO bar
SELECT * FROM cte1
UNION ALL
SELECT * FROM cte2
RETURNING *

mvrhov commented 2 years ago

Oh, thanks for explaining, what I was missing. sqlc does have an ast parser and this is the reason they can do it.