pihvi / yesql

Read named SQL statements from .sql files
60 stars 7 forks source link

SQL Comments in PG can break parsing #23

Closed dwelch2344 closed 3 years ago

dwelch2344 commented 4 years ago

First off, great library! Thank you so much – it makes Postgres so awesome to work with :D

TL;DR: SQL Comments can break parsing Postgres queries

Conditions

Two cases identified thus far, but I'll bet there's more:

  1. Quote regex being thrown off by quotes in comments
  2. Named Parameter regex being thrown off by named params in comments

Background: So, we've used this lib for about 2+ years and have noticed a few issues when dealing with comments. We typically leave behind SQL comments (our queries can be burley) and without digging into it, if we left a named param in a comment (I.e. -- sometimes include :blah) then the param counts come out wrong / all hell breaks loose.

Finally dug into it today when a totally separate issue appeared: our CTEs were not interpolating. After digging around, realized it wasn't actually the CTE, but because we had a comment on the line above with a single quote.

Reproduction

I threw together a really stupid ugly test to see if I could nail down what it was, and it became pretty clear. Try the following:

const named = require('yesql').pg

test('quotes_in_comments_break_parsing', () => {
  function checkTokens (sql, params, token) {
    const { text, values } = named(sql)(params)
    const failed = text.indexOf(`:${token}`) >= 0
    // console.log(text, values)
    if (failed) {
      throw new Error(`SQL contained unparsed parameter ':${token}':\n ${text}`)
    }
  }

  const goodSql = /* sql */ `
    select
      :foo ::INT[]
      -- TODO do not hard code this eventually
      :foo ::INT[],
      -- TODO I suppose we are ok now
      :foo ::INT[]
  `

  const badSql = /* sql */ `
    select
      :foo ::INT[]
      -- TODO don't hard code this eventually
      :foo ::INT[],
      -- TODO I s'ppose we are ok now
      :foo ::INT[]
  `

  checkTokens(goodSql, { foo: [1, 2, 3] }, 'foo')

  // we gonna die here
  checkTokens(badSql, { foo: [1, 2, 3] }, 'foo')
})

test('tokens_in_comments_break_parsing', () => {
  function checkValues (sql, params, expected) {
    const { text, values } = named(sql)(params)
    const failed = values.length !== expected
    // console.log(text, values, failed)
    if (failed) {
      throw new Error(`SQL had ${values.length} values instead of ${expected}:\n ${text} \n Values: [${values.join(',')}]`)
    }
  }

  const goodSql = /* sql */ `
    select
      :foo ::INT[],
      :foo ::INT[]
  `

  const badSql = /* sql */ `
    select
      :foo ::INT[],
      -- :foo ::INT[],
      :foo ::INT[]
  `

  checkValues(goodSql, { foo: [1, 2, 3] }, 2)

  // we gonna die here
  checkValues(badSql, { foo: [1, 2, 3] }, 2)
})

Ghetto, I know. But shows the issue.

Proposed solution

Ignoring comments all together seems like a pretty legit idea. Was looking at the PG specific code, and having something to cut lines starting at -- probably wouldn't be bad (and would be pretty safe).

Multi-lines is a whole other ball of wax... but I want to try to exclude those as well - and if it's not an easy win, just put a disclaimer on those :)

Final Thoughts

Happy to contribute a solution, however you think it best fixed. Thanks!

pihvi commented 3 years ago

Thanks for such a great and well formulated improvement!

This is now available in version 5.0.0 Is this working for you as expected?

scola84 commented 9 months ago

The regexp responsible for removing comments also removes parts of the query that should be kept in place:

WHERE "domain" = '10YNL----------L'

It's not so easy, if at all possible, to write a regexp that covers the cases where double dashes should not be replaced, so would it be possible to make the replacement optional?

scola84 commented 8 months ago

@dwelch2344 not sure whether this got to your attention since it's a closed issue.