universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.48k stars 620 forks source link

Add dual-pass capability #80

Open vhda opened 9 years ago

vhda commented 9 years ago

Looking at some omni-completion functions in vim I understood that in order to identify the class of a certain object those functions need to parse the file until the object declaration is found. Since ctags already does this type of work, why not implement this parsing in it?

This is how I see it:

  1. Parse all files as usual.
  2. From the list of tags identify the class-like [1] tags and populate keyword hash table with it.
  3. Re-start parsing of all files.
  4. The parser skips initialization phase and starts parsing the files directly.
  5. Each new entry would need a special string identifying its class (similarly to signature).

There are more details that need to be looked into, but by using something similar to this we should end up with a complete tag list of the files including object declarations. This should be specially useful when we have a libctags!

[1] Verilog, for example, has multiple tag types that can be used similarly to object declarations.

masatake commented 9 years ago

You enter just another interesting area.

Do you mean running ctags twice or more for file sets? A tags file generated by the 1st run holds enough information for the 2nd stage?

If the target is single input file, ctags has facilities to run multi-pass parsing. (It was introduced for objc parser.)

vhda commented 9 years ago

I was thinking more of implementing this functionality in the core, because it would be something commonly used by all supporting parsers. Nevertheless, let me take a look at what the objc parser is doing.

masatake commented 9 years ago

Again, do you think running ctags twice on the same input file? If yet, look at createTagsWithFallback() of parse.c.

vhda commented 9 years ago

The idea is to run ctags twice in a set of files. Typically there is one class definition per file and we have to know all classes before being able to identify object declarations of those classes.

masatake commented 9 years ago

So in the 2nd pass, cross reference generated in the 1st stage can be used. So the facility for multi-pass in a single file is not enough. I think I understand your intent.

It looks big challenge for me.

The biggest question is how the cross-reference data generated by the 1st pass to the 2nd pass. If you have ideas could you show me a pseudo command line?

Something like this?

(1st pass)$ ./ctags -o tags-1st-pass input-files....
(2nd pass)$ ./ctags -i tags-1st-pass -o final-tags input-files....
vhda commented 9 years ago

Being honest, up until now I've been basically focused on the Verilog parser, so I really do not have any ideas on how to implement this. As such, I was looking for some feedback from the community here :)

I was looking more for something like:

$ ./ctags -R --enable-object-detection input-files

The argument is a bit too big, but is just for demonstration purposes.

Internally I was thinking about having the parser register a list of kinds that can be used to declare variables. In most languages it should be something like "class", "typedef", etc. After the first parse, ctags replaces its keyword hash table with the class-like tags and runs a new parse using the new table.

From a parser point of view, it would only be necessary to add a new list of kinds in the parserDefinition, such that any parser that does not have that definition would not support the 2nd pass and exit cleanly. This way each parser maintainer could gradually implement the support of this feature for any corner-case situations and add relevant test cases.

Update: the parser would need to define to which kind the tag would be used. For example, it could be something like:

or even merge everything in the "variable" kind like:

masatake commented 9 years ago

I would like to use wiki to research this area with you. Please, wait for implementing preload feature first.

vhda commented 9 years ago

Don't worry. I'm using my free time to improve the Verilog parser in order to later include SystemVerilog support. I'm also working on an omni completion script in vim. It will be several weeks before I can look at this issue in detail.

masatake commented 9 years ago

I compiled all docuemnts I wrote into hacking guide. I will write internal of ctags next. It will be the base of this discussion.

masatake commented 8 years ago

... I would like to hear your idea more with an example.

input:

class Foo {
};
Foo bar;

In the first pass Foo is captured as a tag of class kind. If I dump the state as tags file it will be:

Foo input /^class Foo {$/"; kind:class

In the second pass bar is captured as a tag of ...what? Do you mean the kind of the tag is "Foo"?

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:class:Foo

I'm sorry but I need an example. input and expected tags pairs are very helpful for me.

If we introduced reference field tags file for the input will be...

Foo input /^class Foo{$/"; kind:class
Foo input /^Foo bar;$/:"; ref:???
bar input /^Foo bar;$/:l kind:class:Foo

Multiple pass for multiple files are so powerfull like linker of C language. But it needs too many work and may change the definition of ctags program itself. How about multiple pass for single file? Even single file, it is still very interesting.

vhda commented 8 years ago

Let try to pass along my ideas.

  1. The first pass only identifies a subset of the supported kinds. In a object oriented language this subset would typically be "class" and/or "typedef".
  2. In the second pass we identify all kinds, including the kinds identified in the first pass.
    • Each parser will define a conversion table for the special subset. E.g.: class->object; typedef->variable, which would require the existence of "object" and "variable" kinds.
    • The conversion can be done to an existing kind.

So, referring to your example, I would expect something like:

Foo input /^class Foo{$/"; kind:class
bar input /^Foo bar;$/:"; kind:object type:class:Foo

Where "type" would be a special extended attribute. Don't know if we can reuse any existing attribute for that purpose.

Typically classes are defined in different files, so I'm not sure this is really useful in a single file. But we should definitely support that possibility, because many (most?) languages do not enforce the requirement of having a single class defined per file.

masatake commented 8 years ago

I think we can introduce "reference tag" and multiple-input-file multi-path (mm) parser separately. As @shigio shows the concept "reference tag" can be introduced without introducing mm parser. mm parser is useful for improving the quality of capturing reference tags. However, it is still useful ordinary definition tags. Actually bar in your example is a definition tag.

We don't have enough knowledge about how to capturing reference tags. However, we can start from extending tags format: introducing ref: field. Maybe single-input-file multi-path (sm) parser may be useful to improve the quality of capturing reference tags. Only a few parser using sm facility of ctags. During expanding the area using sm parser, we will know what kind of features are needed in the cork. mm parser will come next.

masatake commented 8 years ago
% cat /tmp/foo.c
struct foo bar;
% ./ctags --fields=+t -o - /tmp/foo.c
bar /tmp/foo.c  /^struct foo bar;$/;"   v   typeref:struct:foo

typeref field is already avaiable. mm parser can used as facility for improving the quality of typeref fields in languages.

vhda commented 7 years ago

Replying to https://github.com/universal-ctags/ctags/issues/1488#issuecomment-311416851 :

This is what would be the ideal implementation, IMHO:

masatake commented 7 years ago

I found a good way to implement an infrustructure for multiple-input-file multi-path (mm) parser WITHOUT intermediate file. Newly designed barrel API inspired from cork API is a part of mm API. Surprisingly it is not difficult to implement.

ctags parses a.sv and b.sv, and adds container types found as keywords to language's keywordAssoc. This is pass0. ctags parses a.sv and b.sv, and emits tags file. This is pass1.

I see. I would like you to make tags for the container types found by the parser with marking "putting it to barrel" in the pass0.

In the pass1, you can access tags in the barrel. The barrel of tags are shared parsers. However, about SystemVerilog, only tags of container types(class kind for example) are in the barrel. Therefore you can build the keyword table at the first of pass1.

This will be quite powerfull API...There will be many applications. But I myself will just provide API till 6.0.

I will not make you wait long time.