m-novikov / tree-sitter-sql

SQL syntax highlighting for tree-sitter
MIT License
110 stars 32 forks source link

External parser for dollar quoted strings #26

Closed m-novikov closed 2 years ago

m-novikov commented 2 years ago

PostgreSQL supports stings of the following format $TAG$mycontent$TAG$ https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

These strings often used to define a function body.

m-novikov commented 2 years ago

@MichaHoffmann could you take a look? Are there any obvious issues that I should be aware of?

MichaHoffmann commented 2 years ago

something like

name: build

on: [pull_request]

jobs:
  compile:
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        compiler: [gcc, clang++]

    name: compile
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v2

      - if: matrix.os == 'windows-latest' && matrix.compiler == 'gcc'
        uses: egor-tensin/setup-mingw@v2

      - name: build
        run: ${{ matrix.compiler }} -o scanner.o -I./src -c src/scanner.cc -Werror

might make sense to make sure that your parser can compile in downstream applications

pplam commented 2 years ago

PostgreSQL supports stings of the following format $TAG$mycontent$TAG$ https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

These strings often used to define a function body.

As the doc say:

This form only works for LANGUAGE SQL, the string constant form works for all languages. This form is parsed at function definition time, the string constant form is parsed at execution time.

In PostgreSQL the string constant form is not parsed in definition time but at execution time. As far as I know, PostgreSQL supports the following server-side programming languages:

As presented in the doc: https://www.postgresql.org/docs/14/server-programming.html

Maybe the we should leave the string quoted scripts to external parsers like a PL/SQL or Tcl or Perl or Python parser.

Currently we can focus on the basic syntax of SQL. After it is completed, then we can consider to build a PL/SQL parser.

@m-novikov How do you think about the above idea? ^^

m-novikov commented 2 years ago

Hey @pplam this PR is not about parsing function body (whichever language it is), it's only about correctly interpreting $BODY$$BODY$ as string. Reason for this some PG extensions used as a corpus to validate this parser have following statement.

CREATE OR REPLACE FUNCTION public."Function3_$%{}[]()&*^!@""'`\/#"(
    )
    RETURNS character varying
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
    SET application_name='appname'
    SET enable_sort='true'
AS $BODY$
begin
select '2';
end
$BODY$;

To be able to parse this statement, we don't need to parse the function body as plpgsql, but we do need interpret it as string

pplam commented 2 years ago

Understood this, that would be great to consider this form of dollar quoted string ^^