universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.38k stars 618 forks source link

JavaScript: Ignoring certain tag patterns for JS #1680

Open kristijanhusak opened 6 years ago

kristijanhusak commented 6 years ago

I assume there is already a way to do this, i just wasn't able to figure it out. I'm using ctags on a node js project, which uses node module structure with require()/module.exports.

I would like to skip generating tags for constants that contain the require() in them. For example, this is line in ctags that i would like to skip generating:

CustomerValidator   lib/domain/user/signup_validator.js /^const CustomerValidator = require('..\/customer\/customer_validator');$/;"    C

I know i could skip generating C kind, but i would still love to leave that for other things.

Thanks!

masatake commented 6 years ago

There is no way to do so in ctags.

grep may help you to strip unwanted items like:

ctags -o - the_node_file.js | grep -v require > ./tags

I don't know node js well, so I wonder why you don't want to skip the require lines.

kristijanhusak commented 6 years ago

I want to skip them because when i use go to definition in vim, it goes to the file where i imported that tag, instead of going to that tag directly.

codebrainz commented 6 years ago

@kristijanhusak not a direct solution, but Node supports standard JS import mechanism, maybe that will be handled better by ctags?

masatake commented 6 years ago

I understand what you wrote as follows:

In javascript language level, CustomerValidator is defined two twice. Once as const in signup_validator.js, and once as something in custom_validator. When doing "goto definitions" operation on vim, you expect it takes you to the later one. However, it shows the two for choosing.

Am I correct?

If your answer is yes, I would like to know custom_validator side. I wonder how CustomerValidator is defined in custom_validator.

In Javascript level, there is no solution for your trouble. The code const foo = ... defines foo, and capturing definitions is what ctags should do. ctags should capture definitions as much as possible. You can apply a filter like grep to ctags output. Or you can use smarter (or customizable) front end that chooses proper one when a name is tagged twice or more. Writing here is the fundamental design policy of u-ctags.

However, u-ctags supports sub-languages on language. (http://docs.ctags.io/en/latest/running-multi-parsers.html#tagging-definitions-of-higher-upper-level-language-sub-base)

I wonder I can do something interesting for nodejs input. It is the initial step for implementing nodejs subparser to know how CustomerValidator is defined in custom_validator. However, even if u-ctags handles nodejs in a speciall way, u-ctags just emits more tags for CustomerValidator. The result will not be what you want.

kristijanhusak commented 6 years ago

@codebrainz if you are talking about esm (import module from 'file'), that is not an option, since this is existing project, and a big one. But yeah, ctags are much better for those types of imports, which i noticed on a frontend React project.

@masatake It's more than once, everywhere where i required it.

CustomerValidator   lib/domain/appointment/validation/schema.js /^const CustomerValidator = require('..\/..\/customer\/customer_validator');$/;"    C
CustomerValidator   lib/domain/user/signup_validator.js /^const CustomerValidator = require('..\/customer\/customer_validator');$/;"    C
CustomerValidator   test/generators/appointment_generator.js    /^const CustomerValidator = require('..\/..\/lib\/domain\/customer\/customer_validator');$/;"   C
CustomerValidator   test/unit/lib/domain/customer/create_customer_use_case_spec.js  /^const CustomerValidator = require('..\/..\/..\/..\/..\/lib\/domain\/customer\/customer_validator/;"   C
CustomerValidator   test/unit/lib/domain/customer/customer_validator_spec.js    /^const CustomerValidator = require('..\/..\/..\/..\/..\/lib\/domain\/customer\/customer_validator/;"   C

The main problem is that customer_validator.js itself doesn't have a proper definition where tags can be generated. This is the contents of the customer_validator.js

const Joi = require('../validation/joi');
const Validator = require('../validation/basic_validator');

const createSchema = Joi.object().keys({
  phone: Joi.string().phone().optional().allow(''),
  email: Joi.string().email().optional().allow(''),
});

module.exports = {
  validateCreate: data => Validator.validate(data, createSchema),
};

When i do something like this in customer_validator.js, the tag line for it gets generated:

// ..

const CustomerValidator = {
  validateCreate: data => Validator.validate(data, createSchema),
};
module.exports = CustomerValidator;

In tags i get:

CustomerValidator   lib/domain/customer/customer_validator.js   /^const CustomerValidator = {$/;"   c

I understand why the tag cannot be generated for this concrete file, but i would like to avoid having bad tags even if i don't have the right one.

masatake commented 6 years ago

I wrote long reply in English. Though I mistakenly close the page before submitting it. So I wrote it in C instead.

[yamato@master]~/var/ctags-github% cat customer_validator.js
cat customer_validator.js
const Joi = require('../validation/joi');
const Validator = require('../validation/basic_validator');

const createSchema = Joi.object().keys({
  phone: Joi.string().phone().optional().allow(''),
  email: Joi.string().email().optional().allow(''),
});

module.exports = {
  validateCreate: data => Validator.validate(data, createSchema),
};
[yamato@master]~/var/ctags-github% git diff 
git diff 
diff --git a/main/lregex.c b/main/lregex.c
index 6bdbe0a8..1014b57d 100644
--- a/main/lregex.c
+++ b/main/lregex.c
@@ -967,6 +967,35 @@ static void parseKinds (
 *   Regex pattern matching
 */

+static char *
+translate(const char *input, const char* engine)
+{
+   int i;
+   bool seen_underscore = false;
+   vString *buf = vStringNew();
+
+   for (i = 0; i < strlen(input); i++)
+   {
+       if (i == 0)
+           seen_underscore = true;
+
+       if (seen_underscore && isalnum(input[i]))
+       {
+           vStringPut(buf,
+                      (islower (input[i]))
+                      ? input[i] - ('a' - 'A')
+                      : input[i]);
+           seen_underscore = false;
+       }
+       else if (isalnum(input[i]))
+           vStringPut (buf, input[i]);
+       else if (input[i] == '-' || input[i] == '_')
+           seen_underscore = true;
+       else if (input[i] == '.')
+           break;
+   }
+   return vStringDeleteUnwrap (buf);
+}

 static vString* substitute (
        const char* const in, const char* out,
@@ -976,14 +1005,27 @@ static vString* substitute (
    const char* p;
    for (p = out  ;  *p != '\0'  ;  p++)
    {
-       if (*p == '\\'  &&  isdigit ((int) *++p))
+       if (*p == '\\')
        {
-           const int dig = *p - '0';
-           if (0 < dig  &&  dig < nmatch  &&  pmatch [dig].rm_so != -1)
+           p++;
+           if (isdigit ((int) *p))
            {
-               const int diglen = pmatch [dig].rm_eo - pmatch [dig].rm_so;
-               vStringNCatS (result, in + pmatch [dig].rm_so, diglen);
+               const int dig = *p - '0';
+               if (0 < dig  &&  dig < nmatch  &&  pmatch [dig].rm_so != -1)
+               {
+                   const int diglen = pmatch [dig].rm_eo - pmatch [dig].rm_so;
+                   vStringNCatS (result, in + pmatch [dig].rm_so, diglen);
+               }
            }
+           else if (((int)(*p)) == 'F')
+           {
+               const char *f0 = getInputFileName ();
+               char *f1 = translate (baseFilename(f0), "CamelCase");
+               vStringCatS (result, f1);
+               eFree (f1);
+           }
+           else
+               /* ???*/;
        }
        else if (*p != '\n'  &&  *p != '\r')
            vStringPut (result, *p);
[yamato@master]~/var/ctags-github% cat nodejs.ctags
cat nodejs.ctags
--langdef=nodejs{base=JavaScript}
--kinddef-nodejs=m,module,modules
--extradef-nodejs=implicitDefinedModule,implicitly defined module that can be passed to require
--regex-nodejs=/^module.exports *= *\{/\F/m/{_extra=implicitDefinedModule}{translator=Basename,CameCase}

[yamato@master]~/var/ctags-github% ./ctags --options=./nodejs.ctags  --fields=+lK  --extras-nodejs=+'{implicitDefinedModule}' -o - customer_validator.js
./ctags --options=./nodejs.ctags  --fields=+lK  --extras-nodejs=+'{implicitDefinedModule}' -o - customer_validator.js
CustomerValidator   customer_validator.js   /^module.exports = {$/;"    module  language:nodejs
Joi customer_validator.js   /^const Joi = require('..\/validation\/joi');$/;"   constant    language:JavaScript
Validator   customer_validator.js   /^const Validator = require('..\/validation\/basic_validator');$/;" constant    language:JavaScript
createSchema    customer_validator.js   /^const createSchema = Joi.object().keys({$/;"  constant    language:JavaScript
exports customer_validator.js   /^module.exports = {$/;"    class   language:JavaScript class:module
validateCreate  customer_validator.js   /^  validateCreate: data => Validator.validate(data, createSchema),$/;" property    language:JavaScript class:module.exports
[yamato@master]~/var/ctags-github% 

ctags capatures CustomerValidator from customer_validator.js as an extra tag. (The translators are stub. They are not implemented. )

Tell that tags of nodejs higher priority than JavaScript to your enough smart editor or file viewer, you can go directly to customer_validator.js. It is up to your tool:-P.

masatake commented 6 years ago

translator or translte is bad name. I think I should call it transformer. You can define your own transformers in C. You can apply them to a string pick updated by regex pattern. \F can be used like \1. It represents the name of current input file.

If I found enough supporters of this idea, I will finish the patch.

masatake commented 6 years ago

I should use \{input} instead of \F. Using cryptic short name makes difficult being found via web search.

kristijanhusak commented 6 years ago

That would be awesome! I have a lot of files that are like that, and i don't have a proper tag definition. From what i can read in your code (i'm not so good with C), you do this for all files that have underscore in them. Is that right?

If it is, we should maybe limit creating these only for these situation where we don't have a proper tag definition in a file, like this module.exports = {} thing.

If this gets added, i believe it should be optional, since some people probably don't want this to happen by default.

masatake commented 6 years ago

It does NOT generating the underscore tags for all input files. Generating tags only for matching specified pattern.

The change comes from two parts. One is written in C and the other is written in ctags option file. C part replaces \F in ---regex-... option with input file name with camel case conversion. Currently the converter(transformer?) is hard-coded but it should not in the future.

What I would like to see is ctags option part. It does many things. nodejs parser is defined based on JavaScript parser. nodejs parser is defined as a subparser. It is activated only when JavaScript parser runs. nodejs parser has its own kind, module. It has its onw extra implicitDefinedModule. What the subparser does is very simple: it searches the pattern module.exports *= *\{. If the pattern matches and --extras-nodejs=+'{implicitDefinedModule}' is given, ctags records the current input file name as a tag with the transformation.

The C part I wrote must be added to ctags itself, of course. However, the .ctags optoin file part should be given by user, as you wrote as "optional".

I will wait for more comment from another person. However, I think this is good hack.

kristijanhusak commented 6 years ago

Ok, thanks. Looking forward to it.

masatake commented 6 years ago

https://stackoverflow.com/questions/48613460/using-universal-ctags-how-do-i-tag-a-variable-that-references-an-entire-file

The question on the page is very related to what we discussed here. Making tags "defined by their filename" is the feature I added.

codebrainz commented 6 years ago

It might be better to name the option after CommonJS or RequireJS or whichever library provides this pattern/convention rather than making it Node-specific. Later similar hacks could also be added for the competing module systems. This might make more sense than naming the option after Node which is just bog standard JS.

masatake commented 6 years ago

@codebrainz, sure, I should not use "nodejs" as language name. I thinks the pattern and language should be defined by users. My proposal is the notation {input} in --regex- option.

masatake commented 6 years ago

I will make a pull request.

masatake commented 6 years ago

A new prototype works. I wrote a very small virtual machine for transforming names.

In the protoype, I introduce following notation in --regex-<LANG>:

  \{data|transformer0|transformer1|...}

With the new notatin, what you want can be written:

\{input|basename|deleteExntension|PascalCase}

https://en.wikipedia.org/wiki/Camel_case

$ cat /tmp/signup_validator.js 
cat /tmp/signup_validator.js 
const Joi = require('../validation/joi');
const Validator = require('../validation/basic_validator');

const createSchema = Joi.object().keys({
  phone: Joi.string().phone().optional().allow(''),
  email: Joi.string().email().optional().allow(''),
});

module.exports = {
  validateCreate: data => Validator.validate(data, createSchema),
};
[yamato@master]~/var/ctags-github% cat mynodejs.ctags
cat mynodejs.ctags
--langdef=mynodejs{base=JavaScript}
--kinddef-mynodejs=m,module,modules
--extradef-mynodejs=implicitDefinedModule,implicitly defined module that can be passed to require
--regex-mynodejs=/^module.exports *= *\{/\{input|basename|deleteExntension|PascalCase}/m/{_extra=implicitDefinedModule}
[yamato@master]~/var/ctags-github% ./ctags --fields=+Kl --options=mynodejs.ctags --extras-mynodejs=+'{implicitDefinedModule}' -o - /tmp/signup_validator.js 
<tions=mynodejs.ctags --extras-mynodejs=+'{implicitDefinedModule}' -o - /tmp/signup_validator.js 
Joi /tmp/signup_validator.js    /^const Joi = require('..\/validation\/joi');$/;"   constant    language:JavaScript
SignupValidator /tmp/signup_validator.js    /^module.exports = {$/;"    module  language:mynodejs
Validator   /tmp/signup_validator.js    /^const Validator = require('..\/validation\/basic_validator');$/;" constant    language:JavaScript
createSchema    /tmp/signup_validator.js    /^const createSchema = Joi.object().keys({$/;"  constant    language:JavaScript
exports /tmp/signup_validator.js    /^module.exports = {$/;"    class   language:JavaScript class:module
validateCreate  /tmp/signup_validator.js    /^  validateCreate: data => Validator.validate(data, createSchema),$/;" property    language:JavaScript class:module.exports
masatake commented 6 years ago

As I wrote in the last comment I planed to introduce a shell like language that uses | for representing a chain of functin calls. However, after trying to solve #1577, I think a PostScript alike language is better for the purposes.

shell style:

\{input|basename|deleteExntension|PascalCase}

ps style

\{ input basename deleteExtension PascalCase}

Not so different at a glance. However, we can implement, conditional jump and loop on it. It may be possible to define a procedure if you want.

gp42 commented 4 years ago

Any updates on this issue? I have another use-case with groovy language, where class name is defined as a file name.

laxman20 commented 3 years ago

This sounds exactly what I'm looking for and I hope we can get it merged. I'm working on a AngularJS project where the convention for all class definitions are defined as anonymous class exports.

Given a file named hello-world.controller.js

export default class {
...
}

The tag for this would be HelloWorldController

ctulocal1 commented 2 years ago

Along these same lines, I don’t think ES6 modules are being supported. The syntax for them is export function {function name} (). That is, it’s essentially the same as a normal named function definition, but with they keyword export prepended. These definitions are found in the module (usually with a .mjs extension) and are then brought into another module or javascript source file using import {function name} from '{filename}'. Where * can be used to import all exported function names. Is there some way for me to add this with --regexp or similar flag / feature?