launchql / pgsql-parser

PostgreSQL Query Parser for Node.js
MIT License
162 stars 24 forks source link

deparser bug: Incorrect string escaping #124

Open rwe opened 5 months ago

rwe commented 5 months ago

Currently, deparse represents most string literals escaped only by single-quote doubling. If special characters like backslashes occur, they're output literally, possibly resulting in invalid data. (How and whether those backslashes are interpreted is configuration-dependent).

Comment statements (COMMENT ON TABLE "foo" IS …) are, for some reason, the only string literals that are rendered with an attempt to use C-style "escape" strings (E'foo'), which consistently interpret backslash-escaping.

However, the rendering logic for those is completely wrong. It doesn't do the requisite quote doubling (or, alternatively, quote-escaping), and instead of doubling backslashes, it replaces them with the same number of them.

Effectively comments are rendered worse than unescaped, since they both still open for single-quote mangling, and additionally opened up for backslash interpolation.

That means that a validly written comment like:

COMMENT ON COLUMN "foo"."whatever" IS $$
Something blah, this data may have chars like '\n' and '\r' in it.
$$;

Gets rewritten as:

COMMENT ON COLUMN "foo"."whatever" IS E'
Something blah, this column may have chars like '\n' and '\r' in it.
';

which is completely invalid syntax since the single quotes close on the single quote before the [\, n] chars.

If you change that text to '\n' to "\n" (e.g. to pretend that the single quotes had been correctly escaped), then it outputs as:

COMMENT ON COLUMN "foo"."whatever" IS E'
Something blah, this column may have chars like "\n" and "\r" in it.
';

This parses, but not as the original comment. When re-parsed, becomes:

COMMENT ON COLUMN "foo"."whatever" IS E'
Something blah, this column may have chars like "
" and "
" in it.
';

Instead, Deparser.escape should be written something like:

class Deparser {
  // ...
  escape(str: string): string {
    // If the string contains backslashes, render an escape string using the E'…' syntax.
    // Single quotes and backslashes are escaped with a backslash.
    if (str.includes('\\')) {
      return `E'${str.replaceAll(/[\\']/g, '\\$&')}'`;
    }
    // No backslashes: double any internal single quotes.
    return `'${str.replaceAll(/'/g, '$&$&')}'`;
  }

  escapeNull(str: string | null | undefined): string {
    return str == null ? 'NULL' : this.escape(str);
  }
}

and the relevant lines in CommentStmt (https://github.com/launchql/pgsql-parser/blob/f1df82ed4358e47c682e007bc5aa306b58f25514/packages/deparser/src/deparser.ts#L1163-L1177) should be:

-const escapeComment = (str) => {
-  return str.replace(/\\/g, '\\');
-};
-
-if (node.comment) {
-  if (/[^a-zA-Z0-9]/.test(node.comment)) {
-    // special chars we care about...
-    output.push(`E'${escapeComment(node.comment)}'`);
-  } else {
-    // find a double \\n or \\ something...
-    output.push(`'${node.comment}'`);
-  }
-} else {
-  output.push('NULL');
-}
+output.push(this.escapeNull(node.comment));