universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.56k stars 626 forks source link

Understanding scope tracking in a regex parser #1536

Closed stromsvag closed 7 years ago

stromsvag commented 7 years ago

I am trying to implement a regex parser for a new language (PL/B). However, I am having trouble understanding how scope tracking works. More specifically I am wondering if my use of the scope long flags {scope=push}{placeholder} is correct.

Below is my regex file (plb.ctags), my test code (test.pls) and the generated tags (ctags --options=plb test.pls). I was expecting that the tag "someVariable" would get the scope "function:myFunc.entry".

plb.ctags

--langdef=plb --map-plb=+.pls

--kinddef-plb=d,data,Data Defintion Label --kinddef-plb=f,function,Function Label

--regex-plb=/^([[:alnum:]$]+)[[:blank:]]+DIM.*/\1/d/{scope=ref}{exclusive}{icase} --regex-plb=/^([[:alnum:]$]+)[[:blank:]]+FUNCTION./\1/f/{scope=set}{exclusive}{icase} --regex-plb=/^[[:blank:]]+ENTRY./entry/{scope=push}{placeholder}{exclusive}{icase} --regex-plb=/^[[:blank:]]+FUNCTIONEND.*/entry/{scope=clear}{placeholder}{exclusive}{icase}

test.pls

myFunc function myArg dim 1 entry someVariable dim 2 functionend

ctags --options=plb test.pls

!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/ !_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/ !_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/ !_TAG_PROGRAM_AUTHOR Universal Ctags Team // !_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/ !_TAG_PROGRAM_URL https://ctags.io/ /official site/ !_TAG_PROGRAM_VERSION 0.0.0 /1c8b98d/ myArg test.pls /^myArg dim 1$/;" d function:myFunc myFunc test.pls /^myFunc function$/;" f someVariable test.pls /^someVariable dim 2$/;" d function:myFunc

masatake commented 7 years ago

Usage of scope= is correct.

placeholder is not for the purpose. You can choose only one of printing an language object or not. About entry, you don't want to print it as a tag. However you want to print it as an element of scope. If placeholder is set, the name never appears either as a tag name or a scope element.

I introduced placeholder for helping a developer managing scope depth. FUNCTIONEND in your .ctags is the good place to use it. (However, exclusive implies placeholder.)

If you use the placeholder, you don't have to specify name.

/entry/ of

 --regex-plb=/^[[:blank:]]+FUNCTIONEND.*/entry/{scope=clear}{placeholder}{exclusive}{icase}

doesn't make sense.

[jet@localhost]/tmp% cat plb.ctags.new 
--langdef=plb
--map-plb=+.pls

--kinddef-plb=d,data,Data Defintion Label
--kinddef-plb=f,function,Function Label
--kinddef-plb=E,entryblock,entries placeholder

--regex-plb=/^([[:alnum:]$]+)[[:blank:]]+DIM.*/\1/d/{scope=ref}{exclusive}{icase}
--regex-plb=/^([[:alnum:]$]+)[[:blank:]]+FUNCTION/\1/f/{scope=set}{exclusive}{icase}
--regex-plb=/^[[:blank:]]*ENTRY/entry/E/{scope=push}{exclusive}{icase}
--regex-plb=/^[[:blank:]]*FUNCTIONEND.*//{scope=clear}{placeholder}{exclusive}{icase}
[jet@localhost]/tmp% u-ctags --sort=no --options=./plb.ctags.new -o - input.pls
myFunc  input.pls   /^myFunc function$/;"   f
myArg   input.pls   /^myArg dim 1$/;"   d   function:myFunc
entry   input.pls   /^entry$/;" E   function:myFunc
someVariable    input.pls   /^someVariable dim 2$/;"    d   entryblock:myFunc.entry
myFunc2 input.pls   /^myFunc2 function$/;"  f
myArg2  input.pls   /^myArg2 dim 1$/;"  d   function:myFunc2
[jet@localhost]/tmp% cat input.pls 
cat input.pls 
myFunc function
myArg dim 1
entry
someVariable dim 2
functionend

myFunc2 function
myArg2 dim 1
functionend

--kinds-plb=-E doesn't help us to remove "entry" entry from tags file. The option removes "entry", but it breaks scope.

masatake commented 7 years ago

What is "entry"? I guess "entry" is a maker representing the start of function body.

If yes, I think it is better to assigin "argument" kind to myArg, and "variable" kind to someVariable. myArg and someVariable sould have the same scope.

Am I wrong?

stromsvag commented 7 years ago

What is "entry"? I guess "entry" is a maker representing the start of function body.

If yes, I think it is better to assigin "argument" kind to myArg, and "variable" kind to someVariable. myArg and someVariable sould have the same scope.

Am I wrong?

"entry" marks the start of function body, and whatever comes between "function" and "entry" are arguments. You make a good point about using this as argument kind, and the other as variable kind. However, how can I destinguish between a tag that is "argument" kind and one that is "variable" kind? They have the same regular expression. The only thing that is different is scope, but I do not know of a way to identify what scope is currently active.

placeholder is not for the purpose. You can choose only one of printing an language object or not. About entry, you don't want to print it as a tag. However you want to print it as an element of scope. If placeholder is set, the name never appears either as a tag name or a scope element.

I introduced placeholder for helping a developer managing scope depth. FUNCTIONEND in your .ctags is the good place to use it. (However, exclusive implies placeholder.)

If I understand you correctly, the use of {placeholder} is to "pop" or "clear" scope when not using {exclusive}?

masatake commented 7 years ago

I'm working on stateful multi table parsers. It has enough ability for distignishing argument and local variable. Please, wait for a while.

About placeholder, I should update the document at docs.ctags.io. I have to explain it more clearly.

stromsvag commented 7 years ago

Thank you. I will continue to work on my regex parser for PL/B, and raise any issues here.

masatake commented 7 years ago

Because no one stopped me, I merged the crazy multi state byte oriented regex parser. If you have interest, do "git pull" to get the latest code.

http://docs.ctags.io/en/latest/optlib.html#byte-oriented-pattern-matching-with-multiple-regex-tables

So see what we can do:

input.pls:

myFunc function
myArg dim 1
entry
someVariable dim 2
functionend

myFunc2 function
myArg2 dim 1
entry
someVariable2 dim 2
functionend

pls.tags:

# An experimetal mtable regex parser for PL/B

--langdef=plb
--map-plb=+.pls

--kinddef-plb=d,data,Data Defintion Label
--kinddef-plb=f,function,Function Label
--kinddef-plb=D,localData,Function local data defintion label

--_tabledef-plb=main
--_tabledef-plb=funcheader
--_tabledef-plb=funcbody

# NOTE
# cannot use $ for the end of line. In mtable-regex, $ means the end of file.
#
--_mtable-regex-plb=main/([[:alnum:]$]+)[[:blank:]]+FUNCTION[^\n]*/\1/f/{tenter=funcheader}{scope=push}{icase}
--_mtable-regex-plb=main/.//

--_mtable-regex-plb=funcheader/[[:blank:]]*ENTRY//{tjump=funcbody}{icase}
--_mtable-regex-plb=funcheader/([[:alnum:]$]+)[[:blank:]]+DIM[^\n]*/\1/d/{scope=ref}{icase}
--_mtable-regex-plb=funcheader/.//

--_mtable-regex-plb=funcbody/[[:blank:]]*FUNCTIONEND[^\n]*//{tleave}{icase}{scope=pop}
--_mtable-regex-plb=funcbody/([[:alnum:]$]+)[[:blank:]]+DIM[^\n]*/\1/D/{scope=ref}{icase}
--_mtable-regex-plb=funcbody/.//

Cmdline:

u-ctags --fields=+Kne --sort=no --options=/tmp/plb.ctags -o - input.pls 

Tags output:

myFunc  input.pls   /^myFunc function$/;"   function    line:1  end:5
myArg   input.pls   /^myArg dim 1$/;"   data    line:2  function:myFunc
someVariable    input.pls   /^someVariable dim 2$/;"    localData   line:4  function:myFunc
myFunc2 input.pls   /^myFunc2 function$/;"  function    line:7  end:11
myArg2  input.pls   /^myArg2 dim 1$/;"  data    line:8  function:myFunc2
someVariable2   input.pls   /^someVariable2 dim 2$/;"   localData   line:10 function:myFunc2

Looks fine to me.

masatake commented 7 years ago

If you have a trouble with using mtable-regex parser, please, reopen this.