sbdchd / squawk

🐘 linter for PostgreSQL, focused on migrations
https://squawkhq.com
GNU General Public License v3.0
599 stars 40 forks source link

Ignore specific query #149

Open ccakes opened 3 years ago

ccakes commented 3 years ago

I'm integrating Squawk into a CI pipeline and want to be able to tell it to ignore specific queries, similar to

#[allow(clippy::some_rule)]
// some code that would fail

So we'd have something like

/* Trust me, I'm an expert */
-- squawk: nolint
ALTER TABLE foo ALTER COLUMN bar SET NOT NULL;

I couldn't find a way to do that, if there one? And if not, would you be open to a PR adding that?

sbdchd commented 3 years ago

yeah squawk doesn't have a way to do that currently, I'm open to the feature

I think it might be tricky to add if the AST doesn't include comments, might need to rework things a bit

ccakes commented 3 years ago

Hmm yeah - I see.

It looks like libpg_query v2 exposes pg_query_scan() and pg_query_split_with_xxx(). It might be doable with some combination of those before the parse stage 🤔

saurabhbhalla90 commented 2 years ago

Any updates on this? This is a very useful feature

chdsbd commented 2 years ago

It seems like pg_query.rs has a CommentStmt, which might work.

Here's the related pg_query.proto.

adamrdavid commented 2 years ago

@chdsbd I took a stab at using that, but the CommentStmt is for an actual SQL COMMENT ON https://www.postgresql.org/docs/current/sql-comment.html

The downside of using this is that it actually changes the DB schema and you can only have one comment per object, so it would overwrite any existing comment on the object.

Hopefully there is a way to get the actual code comment i.e. -- I am an SQL comment

chdsbd commented 2 years ago

@adamrdavid That's too bad that doesn't work.

I feel like we need to do something ourselves to parse our comments vs SQL.

james-johnston-thumbtack commented 1 year ago

It looks like the parser gives you locations in the original SQL string.

Take a look at the parser schema that was linked earlier: https://github.com/pganalyze/libpg_query/blob/d1d0186b80c1fbf496265789af89da4e7ca890ab/protobuf/pg_query.proto Search the file for the word "location" - there are tons of hits! So that's the starting clue to figure out where in the original string the linter violation was found.

The next tool could be to look at the the lexer. Here you can see from the example that it clearly shows a CComment token with locations that correlate to a SQL comment in the query: https://docs.rs/pg_query/latest/pg_query/fn.scan.html --- so even though you can't get the code comments in the parse tree, you can still get it from the lexer.

Ideally the pg_query library would have an option for passing the output of the scan function to the parse function, so you can inspect the lexer output while also using it to parse. Instead I guess you have to pass the SQL string to the parser as well, which presumably would scan it a second time, and thus the lexer will run twice. Oh well. A worthwhile price to pay for this feature IMHO, at least until the API for pg_query could be improved in this area. I'm only going to use Squawk in CI/CD to scan a few KB of migrations that are about to be run; I'm not going to scan all my historical migrations & thus I do not care if Squawk runs slower as a result.

Putting it all together:

  1. Run the scanner to get a list of tokens. Build a mapping of file line number to subset of the code comment token(s) on that line, or something like that. (need a way to efficiently locate neighboring lines from the lexer when a linting violation is found while traversing the parsed AST).
  2. When a lint violation is found while examining the AST, get the location integer from the parse tree.
  3. Convert that location to a line number. Look up the same line or prior line using the mapping built in step 1 and see if there was any relevant comment containing a linter directive.
bmorganpa commented 9 months ago

Can we just pre-process the file and strip out any SQL statements that are directly preceded by a -- squawk: nolint comment?

chdsbd commented 9 months ago

@bmorganpa I think that would work. SQL statements are delimited with semi-colons, so that could help with parsing too.

I'm open to PRs supporting this feature

bmorganpa commented 9 months ago

For right now we just added a wrapper script around it to handle git diffs and just added a sed command to trim statements with squawk:ignore-next-statement using: sed '/squawk:ignore-next-statement/,/;$/d'

av-pinzur commented 6 days ago

An alternative approach (perhaps easier to implement?) could be to qualify exclusions in .squawk.toml with specific rule names.