universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.59k stars 629 forks source link

Adding varType TagEntryInfo field from Geany #862

Open techee opened 8 years ago

techee commented 8 years ago

I'd like to start syncing Geany's ctags with universal-ctags and there are a few things we'd need in universal-ctags in order to have all the functionality we need. One of the things we need is a

const char *varType;

field in sTagEntryInfo. As the name suggests, it contains a string corresponding to a variable type or a return type of a function. For instance for

char *foo;

int bar() {
    return 0;
}

it contains "char *" for the variable and "int" for the function. We need this info to be able to perform scope-based autocompletion and also the return value of functions for tooltips. This value is non-NULL only for statically typed languages - the currently supported ones in Geany are the languages from c.c and then Pascal, Rust, Go (c.c implementation in Geany diverges quite a lot and I haven't checked how hard it would be to port it to universal-ctags - the other languages should be easy to port).

Is it OK to add a new field to ctags with this information? I'd prepare a patch doing this. I've also noticed https://github.com/universal-ctags/ctags/pull/857 - one possibility would be to use custom extension field for this info (though the type info should be available for all statically typed languages so quite a big portion so one field defined for all of them might be better).

cc @b4n

masatake commented 8 years ago

I'm working hard to generalize field manipulation based on the idea proposed in #857 by @pragmaware. Could you wait for awhile?

techee commented 8 years ago

@masatake Sure, no problem. From the Geany perspective it's just important that we have some kind of access to the field definitions to map them to Geany's internal types. I did this in the last commit of

https://github.com/geany/geany/pull/957

by accessing LanguageTable. If you are interested what we need from ctags you can have a look at tm_ctags_wrappers.h/c in the pull request.

masatake commented 8 years ago

@techee, did you see typeref:typename introduced by @pragmaware ?

[yamato@x201]~/var/ctags-github% cat /tmp/foo.c
cat /tmp/foo.c
char *foo;

int bar() {
    return 0;
}
[yamato@x201]~/var/ctags-github% ./ctags -o - /tmp/foo.c
./ctags -o - /tmp/foo.c
bar /tmp/foo.c  /^int bar() {$/;"   f   typeref:typename:int
foo /tmp/foo.c  /^char *foo;$/;"    v   typeref:typename:char *

This is not enough? @pragmaware solved two things.

  1. the way to use typeRef[2] field of extensionFields of tagEntryInfo. As @b4n speicified before, ctags main part expects typeRef [0] and typeRef[1] are filled. e.g. "struct" and "foo". for char *foo, typeRef[0] cannot be filled. @pragmaware 's hack is that "typename" is used for filling typeRef[0]. See docs/cxx.rst.
  2. field in tags file. Surprisingly he didn't introduce new filed in tags file. He reused typeref field. In other word his change about tags file format is very small.

However, @pragmaware digs more. Finally he wants a new filed both in data structured and tags file.

@techee, could you look at #872. I made a prototype for allowing a parser defining new field. If you like the prototype, I will work on the area more.

techee commented 8 years ago

@pragmaware solved two things.

the way to use typeRef[2] field of extensionFields of tagEntryInfo. As @b4n speicified before, ctags main part expects typeRef [0] and typeRef[1] are filled. e.g. "struct" and "foo". for char *foo, typeRef[0] cannot be filled. @pragmaware 's hack is that "typename" is used for filling typeRef[0]. See docs/cxx.rst.

field in tags file. Surprisingly he didn't introduce new filed in tags file. He reused typeref field. In other word his change about tags file format is very small.

This is actually almost all we'd need - in Geany we could concatenate typeRef[0] and typeRef[1] to get what we now have in varType. It would just be nicer if typeRef[0] was allowed to be NULL and "typename" was generated when creating the tag in entry.c. The problem is that the cxx parser uses "typename" both for this case but also when it's really a true typename and we won't be able to distinguish this in Geany. In addition, the PHP parser uses "unknown" for the same so in Geany we'd need some per-language table defining what word is used here.

What I'd suggest is https://github.com/universal-ctags/ctags/pull/880 . That is, having typeRef[0] == NULL in this case and generating "unknown" for it when writing out the tag. I find the name "unknown" better here because it doesn't refer any existing tag type which then leads to ambiguity between e.g. real "typename" and "typename that's not really a typename".

@pragmaware @masatake What do you think?

techee commented 8 years ago

@masatake Regarding #872 - it looks nice but as typeRef already exists, it would be better to reuse this one and not to have another field doing almost the same. Moreover, it would add some complications to Geany because each parser could have a different set of the extended fields and we'd need to somehow know which of the fields corresponds to the varType field (and this could be different for every language).

masatake commented 5 years ago

@pragmaware, I have an idea about "typeref:" field and I would like to get your approval. You introduced "typename" as the default value for the 1st subfield of the typeref field. I feel a bit odd about the name. At the first time, I thought the feeling might come from the fact that I didn't know C++. However, @techee also proposed "unknown".

These days I have worked on filling typeref fields in tags generated by go parser. And I have studied how typeref field should be.

(Terminologies: typeref:<the 1st subfield>:<the 2nd subfield> )

If possible, I think the 1st subfield should be a name of a kind, like a scope field.

There are two impossible cases to associate the 2nd subfield to a kind.

case1. The 2nd subfield specifies a compound type. Here is an example:

typedef struct A {
    int i;
} B;

B *foo ()
{
    static B a;
    return &a;
}

The return type of foo is B*, not B. We cannot associate B* to and kind. (About B we can associate it to typedef kind.)

case2. the name of the 2nd field comes from a single token, which implies it is not a compound type, but the name is not defined in the current source file.

#include <Header_File_In_Which_B_is_Typedefed.h>
B foo ()
{
    static B a;
    return a;
}

In the example, B fills the 2nd subfield of typeref. B is a single token. So it will be a class or typedef. However, there is no way to know its kind. ctags must parse the included header file to solve it.

When we are in the case that it is not suitable to use a kind name as a value for the 1st subfield, I think we should make the field EMPTY, instead of filling the field with "typename". Making the subfield EMPTY is my proposal which I would like to get your approval. I would like to make this rule as parser in-dependent default. If a parser defines "unknown" as a kind, the parser can use it for the 1st field in the case2. For the case1, we should make the 1st subfield EMPTY always.


Based on the study of hacking the go parser, I will extend corkAPI. Currently, the cork layer provides only the mapping from cork_index to tagEntry. If a parser required in the initialization phase, corkAPI will provide one more mapping: "name" to cork_indexes, (or name to tagEntries). It means we will get a facility implementing semi-2-pass parser without changing the parser code much.

I would like to show what we can do with an example.

input.cc

class A {
public:
  A(int i0) { i = i0; }
  int i;
};

A foo ()
{
  static A a = 0;
  return a;
}

The current ctags emits:

A   input.cc    /^  A(int i0) { i = i0; }$/;"   f   class:A file:
A   input.cc    /^class A {$/;" c   file:
foo input.cc    /^A foo ()$/;"  f   typeref:typename:A
i   input.cc    /^  int i;$/;"  m   class:A typeref:typename:int    file:

Look at the 1st subfield of typeref: for foo. "typename" is used here. The extended corkAPI allows C++ parser to fill the subfield "class" like:

foo input.cc    /^A foo ()$/;"  f   typeref:class:A

The algorithm for filling the subfield is simple; when the type name, the value for the 2nd subfield of typeref field is a single token, use the extended corkAPI. Using the name of the token as key, search the "name to tag entries" hash table. The parser may get multiple entries if the name is already defined as a type in the source file. The parser chooses suitable one in the candidates. Then the parser can fill the 1st field with the name of kind of the chosen tagEntry.

My proposal using kind name for the 1st subfield and making the subfield empty if there is no kind association is the basis for the algorithm.

Thank you for reading. I will show a running code demonstrating this idea.

pragmaware commented 5 years ago

"typename" is simply a placeholder. Empty is perfectly fine.. or even better as it wastes less space. I can't remember why we have chosen typename over empty at the time... Using a kind name is also fine though there might be slight problems with things like enum class in C++ which are not really a kind on their own.

To tell you the truth, I'm not even sure if the information of the first field of typeref is useful at all. My editor, for instance, discards it. The only scenario I can think of is that of a tool that writes C code by looking at a tag file that is incomplete: it has a symbol for the declaration of a variable, say enum X var; but has no symbol for X itself so it cannot infer that X is an enum. If the tool has to write C code that has the same type of var then it must know that X is really enum X. But that seems to be a very special case...

pragmaware commented 5 years ago

Now that I think deeper of it, in the previous case one could just write enum X in a single field instead of splitting it in two parts...

masatake commented 5 years ago

I think var ... typeref:enum:X ... is better than var ... typeref:enum X ... as far as the current tags file has an entry for X that has enum as its kind. (I would like to say this case as "X is resolved in the tags file".)

Consider a simple client tool wants to show information about var. Its type is included in the information. If enum:X is given, the tool may search entries having X as name and enum as kind in the tags file. This search key (name) and condition (kind) can be extracted from the tags file without having the knowledge about the target language.

If enum X is given, the tool must do tokenize to get the search key and condition. The tool must know which is the identifier to be solved, enum or X.

I can generalize the above explanation. If the 1st subfield is not empty, a tool can expect the tags file has an entry having the name specified in the 2nd subfield. The tags file can have more than one entries for the name. The tool can choose a suitable one by referring to the kind fields of them. The kind of suitable one should have the same kind as the value for the 1st subfield. If the 1st subfield is empty, the tool can know the tool can do know nothing more.

... I found C++ parser doesn't implement as I assumed.

input

enum X foo ()
{
  return 0;
}

output

foo /tmp/foo.h  /^enum X foo ()$/;" f   typeref:enum:X

Though X is not defined in the input, C++ parser reports the kind of X as enum. Too clever than I assumed. In this case, the output I assumed is:

foo /tmp/foo.h  /^enum X foo ()$/;" f   typeref::enum X

Anyway, I understand you don't oppose making the 1st subfield empty instead of filling it with "typename".

techee commented 5 years ago

To tell you the truth, I'm not even sure if the information of the first field of typeref is useful at all. My editor, for instance, discards it.

This is certainly the case with Geany too.

Now that I think deeper of it, in the previous case one could just write enum X in a single field instead of splitting it in two parts...

On the other hand, this would complicate things on the Geany side - for instance, in the symbol tree we show a list of symbols from the current file and for instance for functions we want to show function prototypes and we want to show

X foo()

and not

enum X foo()

and we'd have to somehow be able to separate the "type" part from the "type of type" part and discard this one. ctags parsers have better information about what is what and for us it would be useful if ctags could do it for us.

masatake commented 5 years ago

@pragmaware, look at this.

I will show a running code demonstrating this idea.

Experimentally, I introduced a symbol table to the cork queue. Please, look at the 1st subfield of typeref fields. Kinds are resolved well foo.cc:

typedef int MYINT;

MYINT func(void)
{
  return 0;
};

class s {
  int x;
};

s cppfunc(void)
{
  s s;
  return s;
}

output:

[yamato@slave]~/var/ctags-github% ./ctags -o - /tmp/foo.cc
MYINT   /tmp/foo.cc /^typedef int MYINT;$/;"    t   typeref:typename:int    file:
cppfunc /tmp/foo.cc /^s cppfunc(void)$/;"   f   typeref:class:s
func    /tmp/foo.cc /^MYINT func(void)$/;"  f   typeref:typedef:MYINT
s   /tmp/foo.cc /^class s {$/;" c   file:
x   /tmp/foo.cc /^  int x;$/;"  m   class:s typeref:typename:int    file:

The change neeeded in the parser side:

diff --git a/parsers/cxx/cxx_tag.c b/parsers/cxx/cxx_tag.c
index dcc5cd55..d184057f 100644
--- a/parsers/cxx/cxx_tag.c
+++ b/parsers/cxx/cxx_tag.c
@@ -417,6 +417,25 @@ static bool cxxTagCheckTypeField(
    return true;
 }

+static bool lookupType (unsigned int corkIndex, tagEntryInfo *entry, void *data)
+{
+   int *kindIndex = data;
+
+   switch (entry->kindIndex)
+   {
+
+   case CXXTagKindENUM:
+   case CXXTagKindSTRUCT:
+   case CXXTagKindTYPEDEF:
+   case CXXTagKindUNION:
+   case CXXTagCPPKindCLASS:
+       *kindIndex = entry->kindIndex;
+       return false;
+   default:
+       return true;
+   }
+}
+
 CXXToken * cxxTagCheckAndSetTypeField(
        CXXToken * pTypeStart,
        CXXToken * pTypeEnd
@@ -487,6 +506,16 @@ CXXToken * cxxTagCheckAndSetTypeField(
        return NULL;
    }

+   if (szTypeRef0 == szTypename)
+   {
+       int iKindIndex = -1;
+       symbolTableForeach (vStringValue(pTypeName->pszWord),
+                           lookupType,
+                           &iKindIndex);
+       if (iKindIndex != -1)
+           szTypeRef0 = getLanguageKindName(g_oCXXTag.langType, iKindIndex);
+   }
+
    CXX_DEBUG_PRINT("Type name is '%s'",vStringValue(pTypeName->pszWord));

    g_oCXXTag.extensionFields.typeRef[0] = szTypeRef0; 

This symbol resolver is too simple.I have to consider scopes to resolve kinds correctly.

A skilled parser developer like you will image more interesting applications of symbol table.

Remember, your cxx parser captures (and knows) the names of included files. We can extend the symbol table to multiple input files in the future. --search-path-<LANG>=+path option will be needed:-) We have a parser for Makefile...The -I option value for CPPFLAGS can be known....

masatake commented 4 years ago

@techee, I'm very sorry for taking so long time for this issue. I discussed this topic with @pragmaware at #2395. And I got a question for you.

You wrote:


... we want to show

X foo()

and not

enum X foo()

I found am abnormal example input:

struct file* (make_open_file_fn(void))(const char*)
{
  return 0;
}

This function returns a function and is defined without using a typedef. @pragmaware's C parser works well:

make_open_file_fn   /tmp/bar.c  /^struct file* (make_open_file_fn(void))(const char*)$/;"   f   typeref:struct:file * ()(const char *)

The question is about the typeref field. The first subfield of the typeref is struct. However, the function (make_open_file_fn) doesn't return a struct. It returns a function.

In this case, the value for typeref cannot be as you wanted. In this case, what kind of output as typeref field does Geany want?

As far as seeing mini-geany.c, Geany uses only the second field. Geany refers only `file ()(const char ), and it is good enough?