Open ccakes opened 3 years ago
yeah squawk doesn't have a way to do that currently, I'm open to the feature
I think it might be tricky to add if the AST doesn't include comments, might need to rework things a bit
Hmm yeah - I see.
It looks like libpg_query v2 exposes pg_query_scan()
and pg_query_split_with_xxx()
. It might be doable with some combination of those before the parse stage 🤔
Any updates on this? This is a very useful feature
It seems like pg_query.rs has a CommentStmt
, which might work.
Here's the related pg_query.proto
.
@chdsbd I took a stab at using that, but the CommentStmt
is for an actual SQL COMMENT ON
https://www.postgresql.org/docs/current/sql-comment.html
The downside of using this is that it actually changes the DB schema and you can only have one comment per object, so it would overwrite any existing comment on the object.
Hopefully there is a way to get the actual code comment i.e. -- I am an SQL comment
@adamrdavid That's too bad that doesn't work.
I feel like we need to do something ourselves to parse our comments vs SQL.
It looks like the parser gives you locations in the original SQL string.
Take a look at the parser schema that was linked earlier: https://github.com/pganalyze/libpg_query/blob/d1d0186b80c1fbf496265789af89da4e7ca890ab/protobuf/pg_query.proto Search the file for the word "location" - there are tons of hits! So that's the starting clue to figure out where in the original string the linter violation was found.
The next tool could be to look at the the lexer. Here you can see from the example that it clearly shows a CComment token with locations that correlate to a SQL comment in the query: https://docs.rs/pg_query/latest/pg_query/fn.scan.html --- so even though you can't get the code comments in the parse tree, you can still get it from the lexer.
Ideally the pg_query library would have an option for passing the output of the scan function to the parse function, so you can inspect the lexer output while also using it to parse. Instead I guess you have to pass the SQL string to the parser as well, which presumably would scan it a second time, and thus the lexer will run twice. Oh well. A worthwhile price to pay for this feature IMHO, at least until the API for pg_query could be improved in this area. I'm only going to use Squawk in CI/CD to scan a few KB of migrations that are about to be run; I'm not going to scan all my historical migrations & thus I do not care if Squawk runs slower as a result.
Putting it all together:
Can we just pre-process the file and strip out any SQL statements that are directly preceded by a -- squawk: nolint
comment?
@bmorganpa I think that would work. SQL statements are delimited with semi-colons, so that could help with parsing too.
I'm open to PRs supporting this feature
For right now we just added a wrapper script around it to handle git diffs and just added a sed command to trim statements with squawk:ignore-next-statement
using: sed '/squawk:ignore-next-statement/,/;$/d'
An alternative approach (perhaps easier to implement?) could be to qualify exclusions in .squawk.toml
with specific rule names.
I'm integrating Squawk into a CI pipeline and want to be able to tell it to ignore specific queries, similar to
So we'd have something like
I couldn't find a way to do that, if there one? And if not, would you be open to a PR adding that?