joereynolds / sql-lint

An SQL linter
MIT License
438 stars 40 forks source link

Error: Unable to categorise query: $$; #154

Open jirutka opened 4 years ago

jirutka commented 4 years ago

It seems that sql-lint is unable to parse SQL functions.

example.sql:

create function xor (a boolean, b boolean)
  returns boolean
  immutable language sql
  as $$
    select (a and not b) or (b and not a);
$$;
$ sql-lint -d postgres -v example.sql
node_modules/sql-lint/dist/src/lexer/lexer.js:13
    throw new Error(`Unable to categorise query: ${query}. The query must start with one of ${Object.keys(keywords_1.Keyword)}`);
    ^

Error: Unable to categorise query: $$;. The query must start with one of Alter,Begin,Call,Create,Declare,Delete,Delimiter,Drop,Else,End,From,Having,If,Insert,Join,Leave,Limit,Replace,Return,Select,Set,Show,Truncate,Update,Use,Where,CommentHash,CommentDash,CommentMultiLineStart,CommentMultiLineEnd,Newline,WindowsNewline
    at Object.categorise (node_modules/sql-lint/dist/src/lexer/lexer.js:13:11)
    atnode_modules/sql-lint/dist/src/checker/checkerRunner.js:37:42
    at Array.forEach (<anonymous>)
    at CheckerRunner.run (node_modules/sql-lint/dist/src/checker/checkerRunner.js:34:20)
    at Object.<anonymous> (node_modules/sql-lint/dist/src/main.js:85:8)
joereynolds commented 4 years ago

Hey @jirutka

Thanks for the heads up, I'll take a look!

tunetheweb commented 4 years ago

Would love this too.

Here's the syntax we use just in case you code it based just on above example:

CREATE TEMPORARY FUNCTION getSomeThings(payload STRING)
RETURNS STRUCT<value1 BOOLEAN, `value-2` BOOLEAN>
LANGUAGE js AS '''
var hints = ['value1', 'value-2'];
try {
...
} catch (e) {
...
}
''';

SELECT getSomeThings(payload) as somethings
FROM table;
tunetheweb commented 4 years ago

Did some investigations and it looks like sql-lint does a fairly basic check for semi-colons and assumes any semi-colon starts a new statement.

For example:

create function xor (a boolean, b boolean)
  returns boolean
  immutable language sql
  as $$
    select (a and not b) or (b and not a);
$$;

Here the semi-colon on the second last line, is the issue and it assumes the final line ($$) is the start of a new SQL statement when in fact it's not.

I've submitted a suggested fix in in #159 to basically strip out any "language" code before linting so it basically changes the code to this before linting:

create function xor (a boolean, b boolean)
  returns boolean
  immutable language sql
  as 

;

This does mean we won't lint the bit between $$ and $$. Now technically in this example we could lint that as it's SQL but think it's OK to skip this for now. Certainly better than falling over as it does now.

joereynolds commented 3 years ago

Hey bazzadp,

What dialect of SQL is this? We have better driver support now so it'll be a bit easier to get this thing out.

tunetheweb commented 3 years ago

Hey @joereynolds not sure of @jirutka 's original use case (based on comments in #159 looks like he might be using Postgres) but I'd like this for BigQuery queries (as per that other issue you just commented on!). In those we tend to make heavy of JavaScript functions. They are known as User-Defined-Functions or UDFs in BigQuery.

We've a heap of queries in our repo for that, one of the simpler ones being this example.

We'd love to be able to:

  1. Lint for bad SQL (e.g. typos selec * from...) so they can fail the GitHub super linter as part of an automatic PR review.
  2. Ideally also enforce some consistency in convention (upper case keywords, no trailing whitespace...etc.), some of which may be in sql-lint already, and some of which may be new feature requests.

At the moment this issue is the big blocker as we make heavy use of these UDFs so basically nearly every query fails so sql-lint is not usable for us. There's a few other smaller blockers btw but will get to them if this can be solved.

As they are mostly JavaScript for us, we're not expecting sql-lint to be able to lint the functions themselves, so ignoring them completely so it doesn't fall over, and then linting the surrounding SQL would be more than sufficient for now for us. Which is why I proposed that cheap and cheerful, hacky method in #159 . Again @jirutka may have different requirements here as looks like they are using actual SQL functions so may want the functions themselves linted.