Closed bagl closed 3 years ago
Thank you for reporting and the analysis. I agree with your analysis. I cannot find the way to accept both dialects in this case.
The names of predefined inquiry directives are given: PLSQL. We can use it as a heuristic for deciding how to handle $$
.
However, it seems that a user can define inquiry directives
.
I wonder how to define inquiry directives
.
I found an example in https://www.morganslibrary.org/reference/plsql/inquiry_directives.html.
ALTER SESSION SET PLSQL_CCFlags = 'UW_Flag:1, Some_Flag:2, PLSQL_CCFlags:42';
set serveroutput on
BEGIN
dbms_output.put_line($$UW_Flag);
dbms_output.put_line($$Some_Flag);
dbms_output.put_line($$PLSQL_CCFlags);
END;
/
In the example, $$UW_Flag
and $$Some_Flag
. For them, the heuristic doesn't work.
My question is about the line:
ALTER SESSION SET PLSQL_CCFlags = 'UW_Flag:1, Some_Flag:2, PLSQL_CCFlags:42';
It seems that UW_Flag
and Some_Flag
are defined here.
I got some questions:
ALTER SESSION SET PLSQL_CCFlags = 'UW_Flag:1, Some_Flag:2, PLSQL_CCFlags:42';
? If the answer is yes, what "kind" should we use for tagging?I'm thinking about making a context-dependent table recording the names of "inquiries directives". Known predefined ones are stored when building the table.
I found hints in "2.9.1.4.2 Assigning Values to Inquiry Directives" of https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/plsql-language-fundamentals.html#GUID-623713EB-4FD1-4C8B-955C-4135342F5F68.
@masatake thank you for your interest and swift reaction.
I like your idea to special case the predefined directives, as the names are rather specific and I cannot image using them frequently in postgresql quoted strings.
1) The alter session statements would most likely not reside in your source code repository, so I don't think these tags must be extracted.
2) I haven't found any other way, but it's possible there will be some standard DBMS_xyz
package that would do the same assignment as the alter session above (that's usually the case, but I've not found such package)
@bagl, thank you.
It seems that I found another bug. Could you help me to confirm the bug?
create or replace package body demo_pkg2 is
test_var2 varchar2(64) := $xyz$ABC$xyz$;
function test_func2 return varchar2
as
begin
return test_var2;
end;
end demo_pkg2;
Do you think this is valid input for postgresql?
I think this is valid. In that case ctags should capture test_func2
. However, it doesn't because the SQL parser doesn't handle ;
after $$
.
diff --git a/parsers/sql.c b/parsers/sql.c
index 8e93f2881..c1794f604 100644
--- a/parsers/sql.c
+++ b/parsers/sql.c
@@ -710,13 +710,13 @@ static tokenType parseDollarQuote (vString *const string, const int delimiter)
end_p++;
}
+ if (c != EOF)
+ ungetcToInputFile (c);
+
if (! *end_p) /* full tag match */
break;
else
- {
- ungetcToInputFile (c);
vStringNCatS (string, tag, (size_t) (end_p - tag));
- }
}
}
The way to fix is simple but I cannot write a test case. So I need your help.
I found postgresql doesn't have the concept 'package'. So the above code is invalid. I'm looking for a pattern like:
...
$$QUOTED STRING$$;something input including a language object to be tagged
The SQL parser doesn't handle $$;
well. So the language object may not be tagged.
Preprocessor tokens can collide with the doller quotes, too.
I update the pull request. The update code can handle "$$USER_DEFINED_CCFLAG" as far as it is defined in a same source code.
To make it a bit worse, it looks like in PostgreSQL one defines functions using a string literal... Below are 2 valid PostgreSQL functions containing $$;
character sequence:
# input1.sql
CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars() RETURNS VARCHAR AS '
DECLARE
test_var_no_tag VARCHAR(64) := $$ABC$$;
test_var_tagged VARCHAR(64) := $xyz$XYZ$xyz$;
test_var INTEGER := 1;
BEGIN
RETURN TO_CHAR(test_var, ''000'') || test_var_no_tag || test_var_tagged;
END;
' LANGUAGE plpgsql;
# input2.sql
CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars2() RETURNS VARCHAR LANGUAGE plpgsql AS $$
DECLARE
test_var_no_tag VARCHAR(64) := 'ABC2';
test_var_tagged VARCHAR(64) := 'XYZ2';
test_var INTEGER := 2;
BEGIN
RETURN TO_CHAR(test_var, '000') || test_var_no_tag || test_var_tagged;
END;
$$;
The first function body is defined using a string quoted with '
and uses dollar quoted literals for the test_var_*
.
The second function body is defined using dollar quoted literal but it should not generated any tags by itself (this is tested here and here but without any tags inside the function body).
I expect the following ctags to be generated
expected12.tags
test_dollar_quoted_string_vars input.sql /^CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars() RETURNS VARCHAR AS '$/;" f
test_var_no_tag input.sql /^test_var_no_tag VARCHAR2(64) := $$ABC$$;$/;" v function:test_dollar_quoted_string_vars
test_var_tagged input.sql /^test_var_tagged VARCHAR2(64) := $xyz$XYZ$xyz;$/;" v function:test_dollar_quoted_string_vars
test_var input.sql /^test_var INTEGER := 1;$/;" v function:test_dollar_quoted_string_vars
test_dollar_quoted_string_vars2 input.sql /^CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars() RETURNS VARCHAR LANGUAGE plpgsql AS \$\$$/;" f
test_var_no_tag input.sql /^test_var_no_tag VARCHAR2(64) := 'ABC2;$/;" v function:test_dollar_quoted_string_vars2
test_var_tagged input.sql /^test_var_tagged VARCHAR2(64) := 'XYZ2;$/;" v function:test_dollar_quoted_string_vars2
test_var input.sql /^test_var INTEGER := 2;$/;" v function:test_dollar_quoted_string_vars2
Good point regarding the Oracle preprocessor tokens. They have the form $identifier
and they state in the docs
The character $ can also appear inside plsql_identifier, but it has no special meaning there.
However, I've not found a way to create user defined preprocessor tokens (and cannot imagine what they would actually do). The predefined ones are:
hence all contain just 1 dollar sign and do not collide with Oracle directives or PostgreSQL dollar quoted strings.
They do, however, create other problems I think...
#input3.sql
CREATE FUNCTION test RETURN NUMBER
AS
a $IF DBMS_DB_VERSION.VERSION < 10 $THEN NUMBER; $ELSE BINARY_DOUBLE; $END := 1;
BEGIN
RETURN a;
END;
should generate following tags
expected3.tags
test input.sql /^CREATE FUNCTION test RETURN NUMBER$/;" f
a input.sql /^a $IF DBMS_DB_VERSION.VERSION < 10 $THEN NUMBER; $ELSE BINARY_DOUBLE; $END := 1;$/;" v function:test
(the file names are added by @masatake.)
The PostgreSQL quoted strings can even be nested...
CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars3() RETURNS VARCHAR LANGUAGE plpgsql AS $xx$
DECLARE
test_var_no_tag VARCHAR(64) := $aa$ABC3$aa$;
test_var_tagged VARCHAR(64) := $$XYZ3$$;
test_var INTEGER := 3;
BEGIN
RETURN TO_CHAR(test_var, '000') || test_var_no_tag || test_var_tagged;
END;
$xx$;
Here $xx$
is outer string containing $aa$
and $$
inner strings...
Thank you for the valuable input.
Do you know u-ctags has an ability to run a parser from another parser? See https://docs.ctags.io/en/latest/internal.html?highlight=guest#guest-parser-promise-api.
The question is how to know a string literal (or a quote string) is a SQL code or not.
LANGUAGE plpgsql
is an important hint.
In input1.sql, LANGUAGE plpgsql
comes after the string literal.
In input2.sql, LANGUAGE plpgsql
comes before the quoted string.
Could you give me more hints where LANGUAGE plpgsql
is appeared on the code.
[jet@living]~/var/ctags-github% cat /tmp/input2.sql
CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars2() RETURNS VARCHAR LANGUAGE plpgsql AS $$
DECLARE
test_var_no_tag VARCHAR(64) := 'ABC2';
test_var_tagged VARCHAR(64) := 'XYZ2';
test_var INTEGER := 2;
BEGIN
RETURN TO_CHAR(test_var, '000') || test_var_no_tag || test_var_tagged;
END;
$$;
[jet@living]~/var/ctags-github% ./ctags --quiet --options=NONE --sort=no -o - --kinds-SQL='*' --extras=+'{guest}' /tmp/input2.sql
test_dollar_quoted_string_vars2 /tmp/input2.sql /^CREATE OR REPLACE FUNCTION test_dollar_quoted_string_vars2() RETURNS VARCHAR LANGUAGE plpgsql AS/;" f
test_var_no_tag /tmp/input2.sql /^ test_var_no_tag VARCHAR(64) := 'ABC2';$/;" v
test_var_tagged /tmp/input2.sql /^ test_var_tagged VARCHAR(64) := 'XYZ2';$/;" v
test_var /tmp/input2.sql /^ test_var INTEGER := 2;$/;" v
[jet@living]~/var/ctags-github% git diff --stat
parsers/sql.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
It works. We can improve the accuracy of parsing if we know more about LANGUAGE plpgsql
.
About input3.sql, ctags works as you expected. However, the kind of a
is `l'.
[jet@living]~/var/ctags-github% ./ctags --kinds-SQL='*' -o - /tmp/bar.sql
a /tmp/bar.sql /^ a $IF DBMS_DB_VERSION.VERSION < 10 $THEN NUMBER; $ELSE BINARY_DOUBLE; $END := 1;$/;" l function:test
test /tmp/bar.sql /^CREATE FUNCTION test RETURN NUMBER$/;" f
Is l
(local variable) acceptable?
I had no idea you can "nest" parsers, that's useful :)
Based on the official PostgreSQL docs one defines functions as
CREATE [ OR REPLACE ] FUNCTION
name ( [ [ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
{ LANGUAGE lang_name
| TRANSFORM { FOR TYPE type_name } [, ... ]
| WINDOW
| IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF
| CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
| [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
| PARALLEL { UNSAFE | RESTRICTED | SAFE }
| COST execution_cost
| ROWS result_rows
| SUPPORT support_function
| SET configuration_parameter { TO value | = value | FROM CURRENT }
| AS 'definition'
| AS 'obj_file', 'link_symbol'
} ...
Procedures work the same way
CREATE [ OR REPLACE ] PROCEDURE
name ( [ [ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
{ LANGUAGE lang_name
| TRANSFORM { FOR TYPE type_name } [, ... ]
| [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
| SET configuration_parameter { TO value | = value | FROM CURRENT }
| AS 'definition'
| AS 'obj_file', 'link_symbol'
} ...
The conventions used read:
brackets ([ and ]) indicate optional parts Braces ({ and }) and vertical lines (|) indicate that you must choose one alternative.
Hence, for functions, we have funciton name, optional arguments and optional specification of return value followed by options such as LANGUAGE lang_name
and AS 'definition'
. These can come up in any order.
The built in languages are:
The pl
prefix stands for Procedural Language and optional u
suffix for unsafe (of no importance with respect to tagging). These could be used to select the proper parser for the body of the stored procedure/function in PostgreSQL.
New languages can be added and some of them are mentioned even in the docs:
I've missed that about input3.sql, local variable is perfectly fine, sorry.
Thank you very much. I think the original issue you reported is mostly fixed in #3012. I would like to merge #3012 first. I will open a new issue for parsing nested code.
Perfect. If you ever need help with SQL, I'll be here :)
Thank you for your effort!
SQL parser produces no tags after encounter with PL/SQL directive ($$name)
Here's the failing test case
Test case result showing the missing tags after directive
The PL/SQL directive is a token of two dollar signs followed by alpha and possibly more alphanum or #$ characters (regex: `\$\$[A-Za-z][A-Za-z0-9$#]*
). This definition will probably collide with PostgreSQL [dollar quoted strings](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING) such as
$$Dianne's horse$$or
$SomeTag$Dianne's horse$SomeTag$` implemented here.The version of ctags:
How do you get ctags binary: Building it locally