universal-ctags / citre

A superior code reading & auto-completion tool with pluggable backends.
GNU General Public License v3.0
337 stars 26 forks source link

C: add language support for C #48

Closed AmaiKinono closed 3 years ago

AmaiKinono commented 3 years ago

I am thinking of cross-language auto-completion.

My original thought is, even in a multi-language project, to use a thing from language B in language A, at least you need a wrapper of that thing written in language A.

For finding definitions, since we want to directly see the definition in language B, we don't filter on the language field. But for auto-completion, because the wrapper must exist, all symbols you need can be found in A. So it's safe to only keep tags in language A.

In this PR, what I did is require the tag to be in current file, or in one of the headers that's imported. This implicitly restricts the language.

I don't know hybrid programming well, so I learned how to call nim function in C as an example. Then I found a big problem.

The nim compiler generates source (and header) files in a cache directory, not current directory, so even you included a fib.h in C file, you can't find that header in the project, and that will cause auto-completion to not work.

What's worse, the generated files looks like both a mess and black magic to me. I tried tagging them using ctags, but the function fib is not tagged.

I think it's really a nice idea to require the tag to be in current file, or in imported files (not only for this PR, but also for supporting other languages), but the whole hybrid programming thing (which is exactly the strength of ctags) may imply it's not good to do that.

masatake commented 3 years ago

Can I see the fib.h? I would like to know what happens. I guess there are many cpreprocessor directives.

AmaiKinono commented 3 years ago

@masatake Thanks for joining in!

Can I see the fib.h?

fib.h

```c /* Generated by Nim Compiler v1.4.2 */ /* (c) 2017 Andreas Rumpf */ /* The generated code is subject to the original license. */ #ifndef __fib__ #define __fib__ #define NIM_INTBITS 64 /* section: NIM_merge_HEADERS */ #include "nimbase.h" #undef LANGUAGE_C #undef MIPSEB #undef MIPSEL #undef PPC #undef R3000 #undef R4000 #undef i386 #undef linux #undef mips #undef near #undef far #undef powerpc #undef unix /* section: NIM_merge_FRAME_DEFINES */ /* section: NIM_merge_FORWARD_TYPES */ /* section: NIM_merge_TYPES */ /* section: NIM_merge_SEQ_TYPES */ /* section: NIM_merge_FIELD_INFO */ /* section: NIM_merge_TYPE_INFO */ /* section: NIM_merge_PROC_HEADERS */ N_LIB_PRIVATE N_NOCONV(void, signalHandler)(int sign); N_LIB_PRIVATE N_NIMCALL(NI, getRefcount)(void* p); N_LIB_PRIVATE N_NIMCALL(int, fib)(int a); /* section: NIM_merge_DATA */ /* section: NIM_merge_VARS */ /* section: NIM_merge_PROCS */ N_CDECL(void, NimMain)(void); #endif /* __fib__ */ ```

nimbase.h included by fib.h is here.

Now the main problem is a concept I call "file reachability analysis", see the commentary section in citre-lang-c.el. I really like the idea, but

  1. It doesn't work when macros is used in #include directives. So it doesn't offer effective filtering when I tested it in Linux kernel.
  2. I've seen some C code that uses symbols from a header but doesn't include it. For example, regexec.c uses types from regex.h but doesn't include it. I don't know if this is valid, but the scheme fails on such situations.

Since C has a really loose "module" system, at the end we may give up the file reachability analysis. I may explore it using languages like Python.

masatake commented 3 years ago

@AmaiKinono, thank you. I read fib.hc and I see what happens. You showed the pointer for nimbase.h because you knew what I will do from your information:-)

With my multi-pass-for-c branch (https://github.com/masatake/ctags/tree/multi-pass-for-c), ctags can record fib in the fib.h.

$ cat ~/bin/u-ctags-hint 
#!/bin/sh
u-ctags '--fields=+{language}{signature}' '--fields-C++=+{macrodef}' -o hint.tags "$@"

$ ~/bin/u-ctags-hint --kinds-C++=+p nimbase.h fib.h 

$ cat ~/bin/u-ctags-2nd
#!/bin/sh
u-ctags --fields='+{line}{signature}' --param-CPreProcessor:_expand=1 --_hint-file=hint.tags "$@"

$ ls -l hint.tags 
-rw-r--r--. 1 jet jet 18539 Feb  4 04:33 hint.tags

$ ~/bin/u-ctags-2nd --kinds-C++=+p nimbase.h fib.h 

$ readtags -e fib
fib fib.h   /^N_LIB_PRIVATE N_NIMCALL(int, fib)(int a);$/;" kind:p  typeref:typename:int __attribute ((__fastcall)) signature:(int a)

You showed two items 1. and 2. As I demonstrated above, 1. can be solved conceptually. About 2., I didn't know. I have to study the background of it.

AmaiKinono commented 3 years ago

Let me explain the concept.

A main problem of tags based tools is they basically only match by tag names. If you find the definition of some_func, then some_func from the whole project are listed in the result. Typically there are many irrelevant ones.

File reachability analysis is a mean to narrow down the result. For many programming languages, we have:

  1. A file can only used symbols from imported modules in that file.
  2. Module names are tied to file names.

Based on these info, for each tag, we can see its input field, and decide whether it's "reachable" or not. "Reachable" means tags from that file can be used in current file.

By narrow down the result to "tags from reachable files", we exclude many irrelevant tags.

Now let me explain the problem of my implementation of this in C.

As I demonstrated above, 1. can be solved conceptually.

Practically it can't be. My implementation is:

  1. Parse current file, find all included headers in it.

    This is done in elisp. Alternatively, We can look into tags file to find "all headers included by a source file". They are reference tags, but that means we can't make use of binary search of readtags. So for any large enough project it will become very slow.

  2. For each header, find out its path using tags file.

    This is done using file/F tags. Binary search can be done.

  3. Open that path, and do 1 again (parse it using elisp).

Because step 1 is done in elisp, we can't make use of macro expansion result in tags file.

About 2., I didn't know. I have to study the background of it.

Ah, I thought it was a simple question... Anyway, let me put the question clear.

From what I've learned many years ago, in C, if I want to use a symbol from another file, I have to:

When I test using regexec.c, I found it uses type definitions from regex.h, but doesn't include it. Is it valid? If it is, my analysis scheme may be basically wrong... Do you have any idea how it should be done?

AmaiKinono commented 3 years ago

When I test using regexec.c, I found it uses type definitions from regex.h, but doesn't include it. Is it valid?

I think I figured it out.

regexec.c is not compiled to an object file.

It's included in regex.c. regex.c includes regex.h first, than regexec.c. This order ensures that the types in regex.h are defined before the content of regexec.c.

Though I don't think relying on the order of #include directive is robust, the fact that this can be done means file reachability analysis without compile time information is not realistic.

So I'd like to deprecate this feature for C.