@slonik/typegen – Discussion – Upcoming feature: Ignores

janpaepke commented 2 years ago

problem

Currently the recommended way to make slonik-typegen ignore things, is to rename the template literal. This has several drawbacks:

It breaks vscode extensions like SQL Lit
It makes the code more verbose
It requires code changes which act as control symbols, which should be comments.
It is quite unexpected for devs unfamiliar with the pattern, which might cause accidental regressions.

proposed solution

The current plan is to model the ignore behaviour along the syntax of eslint.

Ignore individual queries: // slonik-typegen-disable-line -> ignore sql in the same line

// slonik-typegen-disable-next-line -> ignore sql in the next line

Ignore file sections:

/* slonik-typegen-disable */
const ignoredQuery = sql`select will_be from left_untouched`
/* slonik-typegen-enable */

Ignore file(s): 1) Ignore Option in config. (Existing) 2) /* slonik-typegen-disable */ at top of file.

Feedback is very welcome.

mmkal commented 2 years ago

I think no great need for disable-line - in my opinion it’s ugly/too much cruft on one line and I suspect nobody will be too sad not to have it. If someone is sad, they can open an issue with the use case. Agreed on all the rest.

@gajus since you’ve asked for this before, any thoughts?

janpaepke commented 2 years ago

Another thing to consider are sql files. How do we want to treat them? Could you supply an example file (I don't use them)

mmkal commented 2 years ago

Another thing to consider are sql files. How do we want to treat them? Could you supply an example file (I don't use them)

Easiest way to see how generation from a sql file works is in this test: https://github.com/mmkal/slonik-tools/blob/ecd6f486d645b662f403315b458b5e419f60ac38/packages/typegen/test/sql.test.ts#L31-L142

I’d suggest not doing anything for now though. 1) because it’s fairly easy to use exclude to ignore the whole file, 2) typegen assumes any matched files have only one query in them so inline directives just aren’t as important, 3) I suspect it’s a less used feature so we can just wait until someone requests it and explains why they don’t want to exclude at a config level (also 4. you could also fairly easily configure to look for — typegen-ignore in the sql file if all else fails)

One thing potentially worth considering though: an in-sql directive would automatically solve everywhere. Basically look for —typegen-ignore inside the sql itself, rather than in typescript. But that feels a bit weird to me, and would make single-line queries pretty cumbersome.

janpaepke commented 2 years ago

So I haven't started on this because I'm still weighing the options.

It's funny how our trains of thought match. An in-sql comment was actually my first idea. I didn't propose it because as you said it makes single line queries or especially fragments "cumbersome" at best and pretty much unreadable at worst.

Yet what I also don't really like is having differing ignore logic between sql and ts files.

If that turns out to be the best option, we might want to limit ts ignores even more. Maybe only supporting section comments (as you previously suggested), so we'd have "only" 2 different ways of ignoring.

What I was also considering is this: Are queries containing "select" or "returning" the only type of queries where we'd even expect a return value? Because if so, we could ignore all others and solve the fragment problem even without ignores...

I'll think about this some more.

Pimm commented 2 years ago

Out of all SQL commands in PSQL, I too think that insert … returning, select, update … returning, and delete … returning and the only ones relevant to this tool… except maybe values, although I don't think this tool would be helpful with values anyway.

Jan, you write "[i]t requires code changes which act as control symbols, which should be comments". Could you elaborate on "which should be comments"?

janpaepke commented 2 years ago

thanks for chiming in!

I too think that insert … returning, select, update … returning, and delete … returning and the only ones relevant to this tool

Yeah I've tried to come up with more, and asked other people, but it seems we all agree: we can probably agree any query not including these keywords. We could include values, for good measure. This is great, as it automatically removes the need to ignore fragments, which strengthens my conviction that we probably don't need single line ignores.

Could you elaborate on "which should be comments"?

Yes, what I mean is that the only motivation for code changes, should be changing the behaviour of the code itself, not to influence behaviour of code that processes it.
If at all, deliberately influencing code processors should be done using comments, as they have a different task than the code itself: they store meta information about the code.

mmkal commented 2 years ago

One consideration: when I was originally writing this iteration of slonik-typegen, I was trying to avoid assuming that we'd always be able to parse the query. It relies on pgsql-ast-parser and at the time, at least, that library wasn't willing to guarantee that it'd be able to parse all postgres queries. But psql <query> \gdesc is able to parse all queries - and originally pgsql-ast-parser understanding a query was just a "bonus" which could give nullability info for the majority case of queries it could handle.

Fragments can be invalid SQL (like sql`where foo = ${bar}`, but we won't necessarily know whether the SQL is invalid because it's a fragment or because it's just something pgsql-ast-parser doesn't understand (yet). How do you think we should distinguish between these cases? Should we make parseability by pgsql-ast-parser a hard requirement, and ignore anything else, assuming it's a fragment or an untypeable query?

janpaepke commented 2 years ago

I think I share your view of typgen: It should do everything in its ability to provide types for queries.

Wether it does its magic through gdesc (Essentially EXPLAIN) or pgsql-ast-parser doesn't really matter, the ends justify the means. But we do accept, that it may not be able to extract every query perfectly and be vague if it's "not sure". This can always be gradually improved later (i.e. parsing function results etc.).

My point is not every query (or fragment thereof) makes sense to be parsed. In your example I would not expect a type like sql<{foo: number}>`where foo = ${bar}` to be extracted, because if ran as is it would not return anything (or be invalid actually).

The only thing that needs to be parsed is the query using the fragment, i.e. sql`select * from table ${whereFragment}`. This is where we're interested in the returning result. Now if the columns selected are fragments as well, this is where things get tricky and I think you covered this case in the "Limitations" Section.

So it's not really about what is in principle "parsable" but where we'd expect a result. Because only there we'd want to know the types.

janpaepke commented 2 years ago

tl;dr
Summary / Conclusion at the bottom.

Okay, so... After having a couple of showers and talking to friends about this (no, not simultaneously) I think I have a final opinion on the best way forward.

Here's my train of thought:

One of the main motivations for needing an "escape hatch" is to avoid getting errors for unparsed (and unpareseable) queries like sql fragments, which a user wouldn't even expect to be processed. Now by excluding all queries, which don't contain SELECT, RETURNING or VALUES, we prevent typegen from attempting to process it. They don't return anything, so why would I need types for it? It's actually the same already with literals (fragments) inside of other literals. They are not being processed and noone would expect them to be.

Let's call this type of ignore "implicit ignores"

Now in contrast to this type, there are "explicit ignores". A case where typegen could potentially or in principle parse a query, but the user doesn't want it to, for whatever reason. This intent should be shown and controlled using comments, as initially argued.

But should they be single line comments?
Well, those were primarily conceived of for sql fragments, which would otherwise be tedious to ignore with a ignore-start / ignore-end type of pattern. More comment than code. 🙄
But fragments are now already implicitly ignored, so no real need to have single line ignore-comments anymore.

What about sections?
Well how likely is it that you have a file with multiple queries and you want to ignore 2 out of three?
And even if that were the case the comments might then be scrolled out of view, so that you'd lose context.

OK, then What?

As considered here before we should use SQL comments:
If we find --typegen-ignore or /* typegen-ignore */ anywhere inside the query, typegen will ignore it.

This will equally work for .sql files and isn't really a big deal for single line queries. They can stay single line: Just put the comment at the end!

Also in contrast to implicit ignores, typegen shouldn't be quiet about it.
Whenever a query is explicitly ignored, it will log as something like "2/3 queries processed, 1 ignored".
This may alert a user to an incorrect assumption that after running typegen all types are updated.

Conclusion

Here's the summary of the intended change:

Prevent all queries, missing SELECT, RETURNING or VALUES, from even being considered (not even run them through GDESC).
Do not process queries, which have a typegen-ignore comment in them
reflect in logs, when explicitly ignoring
Update docs

For both 1. and 2. I am planning on using regexes, even though there might be false positives (sql`where col='SELECT'` or `sql`select '--typegen-ignore'`). To be sure I'd have to attempt to parse them, but I'm not sure this is worth it. These seem like very weird edge cases.

I think you will appreciate the small foot-print of this proposal, but I'm curious on your feedback!

best, J

mmkal commented 2 years ago

I like it! It's a nice balance. I think it makes a lot of sense to start out with a simple regex search for the ignore pattern, and that would probably be totally fine forever. Even those edge cases you mentioned have easy workarounds. 👍 👍 👍

mmkal commented 2 years ago

I think we could now add some really interesting features like having the CLI fail if there are any queries it fails to add types to, since now we can ignore anything it can’t handle. This would essentially turn slonik-typegen into a sql linter too, which validates not only sql syntax but also your actual schema. It’s already doing all the hard work - watching the file system, exposing a cli, specifying the connection string, and allowing ignoring queries it can’t handle (yet). But I think that’s a separate piece of work so I’m going to close this issue. Let me know if you think it should be reopened.

mmkal / pgkit