taozhi8833998 / node-sql-parser

Parse simple SQL statements into an abstract syntax tree (AST) with the visited tableList and convert it back to SQL
https://taozhi8833998.github.io/node-sql-parser/
Apache License 2.0
814 stars 180 forks source link

Question about our custom SQL dialect #1749

Closed eXigentCoder closed 10 months ago

eXigentCoder commented 10 months ago

Background

We have built an open source library that essentially takes a custom dialect of SQL based off of Postgresql and converts it to a MongoDB query or aggregation pipeline. We rely very heavily on your library and it has made this work so much easier, so thank you! Currently we are on version 4.9.0 and I'm in the process of upgrading incrementally, in version 4.10.0 there was a change introduced that is giving us some issues with out custom statements that I would like some advice on, the line is:

    if (ast && ast.type === 'select') {
      if (!(head.parentheses_symbol || head.parentheses || head.ast.parentheses || head.ast.parentheses_symbol) || ast.columns.length !== 1 || ast.columns[0].expr.column === '*') throw new Error('invalid column clause with select statement')
    }

Describe the bug When we try execute the following query:

SELECT  id
        ,"First Name"
        ,"Last Name"
    ,(SELECT * FROM Rentals WHERE staffId<10) AS rentalsArr 
FROM customers

We now get Error: invalid column clause with select statement. Which makes sense, because this to my knowledge isn't valid Postgresql. Essentially, the Rentals column here is an array of JSON objects and our library allows you to filter it in a single line. I'm trying to figure out what the best way to cater for this would be and would appreciate your advice.

Options

  1. I suppose that one option would be to fork your library and add our own pegjs file with the rules for our custom dialect, I'm not familiar with it at all and the documentation seems pretty sparse from their GitHub page but it could work I think?
  2. Another approach could be perhaps to add an option when calling the parse function that prevents the invalid column clause with select statement, I haven't managed to upgrade to the latest version yet so I'm not sure if we will run into other issues though.

Your advice would be heavily appreciated, thank you! Database Engine PostgreSQL.

To Reproduce -the SQL that be parsed

SELECT  id
        ,"First Name"
        ,"Last Name"
        ,(SELECT * FROM Rentals WHERE staffId<10) AS rentalsArr 
FROM customers

-the node-sql-parser version 4.9.0 -the node version 14.21.1

taozhi8833998 commented 10 months ago

@eXigentCoder You want to parse the custom dialect SQL successfully, which is based on the PostgreSQL dialect?

SELECT  id
        ,"First Name"
        ,"Last Name"
    ,(SELECT * FROM Rentals WHERE staffId<10) AS rentalsArr 
FROM customers

The above SQL is not valid in PostgreSQL, but you want to parse it in your custom dialect. You want an option to skip the invalid column clause with select statement error in pg, so you can parse it by the pg grammar, right?

eXigentCoder commented 10 months ago

@taozhi8833998 Yes, that's right, I think that would help us upgrading to the latest version

taozhi8833998 commented 10 months ago

@eXigentCoder ok, let me think of a better way to achieve this. Any progress will be updated here.

eXigentCoder commented 10 months ago

Thank you, really appreciate it and the fast responses, will see what I can do about getting you a tip from my boss!

eXigentCoder commented 10 months ago

We trying to make a donation with PayPal, but it only allows $5, is there any way we can type in our own amount, we would like to donate more than that

taozhi8833998 commented 10 months ago

@eXigentCoder Do you need (SELECT * FROM Rentals WHERE staffId<10) AS rentalsArr, which means select all columns from Rentals, but as one column rentalsArr? Does it make sense? If the column is not *, just a specified column, it works.

SELECT  id
        ,"First Name"
        ,"Last Name"
    ,(SELECT xxx FROM Rentals WHERE staffId<10) AS rentalsArr 
FROM customers
eXigentCoder commented 10 months ago

Yes, in our test data, we are trying to get all the columns for the array of JSON objects in the rentals "column", if it helps, this is what the data that is returned looks like in version 4.9.0:

{
  "_id": "61b0fdcbdee485f7c0682db6",
  "id": 1,
  "First Name": "MARY",
  "Last Name": "SMITH",
  "rentalsArr": [
    {
      "Film Title": "MUSKETEERS WAIT",
      "Payments": [
        {
          "Amount": 5.989999771118164,
          "Payment Date": "2005-06-15 00:54:12.0",
          "Payment Id": 3
        }
      ],
      "Rental Date": "2005-06-15 00:54:12.0",
      "Return Date": "2005-06-23 02:42:12.0",
      "filmId": 611,
      "rentalId": 1185,
      "staffId": 2
    }
  ]
}

I think what would be best is if I fork your library from the master branch, comment out that line and then run all of our tests and make sure that that line is the only issue we have with the latest version.

taozhi8833998 commented 10 months ago

@eXigentCoder I have supported by passing a parseOptions to astify function. I will push the codes later.

const opt = {
  database: 'postgresql',
  parseOptions: {
    ignoreSelectColumnsError: true
  }
}
const sql = 'SELECT  id,"First Name","Last Name", (SELECT * FROM Rentals WHERE staffId<10) AS rentalsArr FROM customers'
// by passing the ignoreSelectColumnsError in parseOptions,  the module will skip throw the 'invalid column clause with select statement' error
const ast = parser.astify(sql, opt)
console.log(parser.sqlify(ast, opt)) // SELECT "id", "First Name", "Last Name", (SELECT * FROM "Rentals" WHERE "staffId" < 10) AS "rentalsArr" FROM "customers"
taozhi8833998 commented 10 months ago

@eXigentCoder, if you encounter any further compatibility problems in the main branch, kindly inform me. I will investigate and incorporate additional solutions for them.

eXigentCoder commented 10 months ago

Thank you so much! I checked out the source code of the library and used npm link to test it locally with the change and it fixed a few of our failing tests but I am getting a few more errors that I'm struggling to track down. About 206 out of 1193 are failing which used to work in 4.9.0. I won't include all of them but from what I can see all of them have one of these messages:

  1. SyntaxError: Expected "(", ",", "--", ".", "/*", ";", "AS", "CROSS", "FULL", "GROUP", "HAVING", "INNER", "INTO", "JOIN", "LEFT", "LIMIT", "OFFSET", "ORDER", "RIGHT", "TABLESAMPLE", "UNION", "WHERE", "WINDOW", "\"", [ \t\n\r], [A-Za-z_一-龥], or end of input but "'" found.

    SELECT  Order1.id,
        unset(_id)
    FROM orders Order1
    INNER JOIN(
    SELECT * FROM orders) 'Order2|unwind' on Order2.id = Order1.id
    LIMIT 1
  2. SyntaxError: Expected ",", "--", "/*", ";", "AS", "CROSS", "FULL", "GROUP", "HAVING", "INNER", "INTO", "JOIN", "LEFT", "LIMIT", "OFFSET", "ON", "ORDER", "RIGHT", "UNION", "USING", "WHERE", "WINDOW", "\"", [ \t\n\r], [A-Za-z_一-龥], or end of input but "`" found.

    SELECT c.*
       ,cn.* 
    FROM customers c 
    INNER JOIN (SELECT * from `customer-notes` where id>2) `cn|first` ON cn.id=c.id
  3. SyntaxError: Expected ",", "--", ".", "/*", ";", "AS", "CROSS", "FULL", "GROUP", "HAVING", "INNER", "INTO", "JOIN", "LEFT", "LIMIT", "OFFSET", "ON", "ORDER", "RIGHT", "TABLESAMPLE", "UNION", "USING", "WHERE", "WINDOW", "\"", [ \t\n\r], [A-Za-z_一-龥], or end of input but "`" found.

    SELECT * 
    FROM customers c 
    LEFT OUTER JOIN `customer-notes` `cn|first` ON cn.id=convert(c.id,'int')
  4. SyntaxError: Expected "--", "/*", "\"", [ \t\n\r], or [A-Za-z_一-龥] but "'" found.

    SELECT id
       ,`First Name`
       ,`Last Name`
       ,avg_ARRAY((select filmId as '$$ROOT' from 'Rentals')) as avgIdRentals 
    FROM customers
  5. SyntaxError: Expected "--", "/*", "\"", [ \t\n\r], or [A-Za-z_一-龥] but "`" found.

    SELECT CAST(abs(`id`) as decimal) AS `id`
    FROM `customers`
  6. SyntaxError: Expected "!=", "#", "%", "&", "&&", "(", "*", "+", ",", "-", "--", "->", "->>", ".", "/", "/*", "<", "<<", "<=", "<>", "=", ">", ">=", ">>", "AND", "BETWEEN", "IN", "IS", "LIKE", "NOT", "OR", "REGEXP", "RLIKE", "USING", "XOR", "^", "div", "|", "||", or [ \t\n\r] but ")" found.

select convert(`Replacement Cost`) as s from `films`
  1. SyntaxError: Expected "!=", "#", "%", "&", "&&", "*", "+", ",", "-", "--", "/", "/*", "<", "<<", "<=", "<>", "=", ">", ">=", ">>", "AND", "BETWEEN", "COLLATE", "IN", "IS", "LIKE", "NOT", "OR", "REGEXP", "RLIKE", "USING", "XOR", "^", "div", "|", "||", or [ \t\n\r] but ")" found.
    SELECT *
       ,convert(`Replacement Cost`,'int') AS s 
    FROM `films`
taozhi8833998 commented 10 months ago

@eXigentCoder ok, I will check it one by one.

taozhi8833998 commented 10 months ago

@eXigentCoder Do you have any name of your custom dialect, it looks like the dialect mixed with MySQL and PostgreSQL. If I add more options to PostgreSQL grammar, it will make it too complicated to maintain. so I prefer to generate a new dialect for you.

taozhi8833998 commented 10 months ago

@eXigentCoder Your test cases failed basically because of the commit

eXigentCoder commented 10 months ago

Ah, thank you for the information, that makes sense! Yeah I think a custom dialect would work best going forward if you don't mind, we call it noql

taozhi8833998 commented 10 months ago

@eXigentCoder ok, got it.

taozhi8833998 commented 10 months ago

@eXigentCoder new dialect noql has been added, which is based on the PostgreSQL dialect, and has passed the failed test cases you provided. You can check the above pr.

eXigentCoder commented 10 months ago

Thank you so much, you are amazing, will check it out shortly and report back!

eXigentCoder commented 10 months ago

Thank you so much for the awesome work, I just ran our full suite of tests against the PR and all are passing!!🥳

taozhi8833998 commented 10 months ago

@eXigentCoder ok, I will merge the PR, and you can upgrade to the 4.18.0 version when I publish it. If you have issues in the future, please open a new issue, and specify the dialect noql.

taozhi8833998 commented 10 months ago

@eXigentCoder merged.

eXigentCoder commented 10 months ago

Great stuff, thank you for all your hard work, will update to 4.18.0 in our project and will be sure to specify the dialect in future!

eXigentCoder commented 9 months ago

Hey @taozhi8833998, just wanted to check if you have an estimation for when the next publish will be?

taozhi8833998 commented 9 months ago

Hey @taozhi8833998, just wanted to check if you have an estimation for when the next publish will be?

hi, @eXigentCoder The new version will be released on the Friday of this week.

eXigentCoder commented 9 months ago

Awesome, thank you!

taozhi8833998 commented 9 months ago

@eXigentCoder The new version 4.18.0 has been released, you can try it.

eXigentCoder commented 9 months ago

Awesome, thank you so much, will pull it down and get it integrated today, thanks so much!