Closed masatake closed 3 years ago
Oh I didn't notice the -n
option before. It should happen when parsing the excmd. I'll look into it tomorrow.
Should be fixed by https://github.com/AmaiKinono/citre/commit/38b0c054580f282e883a964ec1bc7aae60ff9393.
I also want to add an ext-line
field, that could get the line number when the line:
field doesn't exist, but the tags file is generated using -n
. In this way, users could further reduce the size of their tags files in large projects.
I'm also thinking about fully support the pattern field when jumping. I was thinking "why not just use a line number for the pattern field" before. Now from ctags(1), I know that regex pattern is used to deal with source file update to some extent, which is very good. The benefits are:
It shouldn't be hard to implement, since it's just regex matching, and vi emulators like viper
or evil
in Emacs already implements them.
I'm implementing the search pattern right now. I'm assuming for forward searching patterns (/pattern/;"
), we search from the beginning of the file, and for backward searching patterns (?pattern?;"
), we search from the end of the file.
However, I didn't see the manpage of ctags or any online resources talking about this. @masatake Is this right?
Another thing I'm not sure is with the combined patterns (like 130;"/pattern/;"
), should we go to the beginning of the end of line 130 then doing the search?
Edit: Tried this in vim. :1
goes to the beginning of line 1, but :1/pattern/
seems to begin its search from the end of line 1.
Edit 2: The ex command implementation in evil
does exactly what I said, so I should be correct.
I have to write that I don't know well about the pattern field. As you know very well, I don't use vi much. So how vim uses the patterns.
What I know are:
A. other than truncation, we don't modify the code emitting patterns. This means the output of u-ctags may be the same as that of e-ctags. The truncation is introduced to handle too long lines. See --pattern-length-limit of ctags(1) of u-ctags. B. the pattern is not a regex pattern!!! Other than escaping [?/] with a backslash, the pattern is a just substring of the input source code. See an example below. C. as far as I can remember correctly, we have not gotten a bug report about the pattern output.
/usr/bin/ctags is Exuberant-crags.
$ cat /tmp/bar.c
cat /tmp/bar.c
char f /* regex meta char: .*?^$ */ (char args[])
{
return args [0];
}
char g /* meta char of ctags pattern: /?;" */ (char args[])
{
return args [0];
}
$ /usr/bin/ctags -o - /tmp/bar.c
/usr/bin/ctags -o - /tmp/bar.c
f /tmp/bar.c /^char f \/* regex meta char: .*?^$ *\/ (char args[])$/;" f
g /tmp/bar.c /^char g \/* meta char of ctags pattern: \/?;" *\/ (char args[])$/;" f
$ /usr/bin/ctags -B -o - /tmp/bar.c
/usr/bin/ctags -B -o - /tmp/bar.c
f /tmp/bar.c ?^char f /* regex meta char: .*\?^$ */ (char args[])$?;" f
g /tmp/bar.c ?^char g /* meta char of ctags pattern: /\?;" */ (char args[])$?;" f
$ u-ctags -o - /tmp/bar.c
u-ctags -o - /tmp/bar.c
f /tmp/bar.c /^char f \/* regex meta char: .*?^$ *\/ (char args[])$/;" f language:C typeref:typename:char
g /tmp/bar.c /^char g \/* meta char of ctags pattern: \/?;" *\/ (char args[])$/;" f language:C typeref:typename:char
$ u-ctags -B -o - /tmp/bar.c
u-ctags -B -o - /tmp/bar.c
f /tmp/bar.c ?^char f /* regex meta char: .*\?^$ */ (char args[])$?;" f language:C typeref:typename:char
g /tmp/bar.c ?^char g /* meta char of ctags pattern: /\?;" */ (char args[])$?;" f language:C typeref:typename:char
I implemented --excmd=combine
as requested in https://github.com/universal-ctags/ctags/issues/1125.
How the tag file is handled can be seen in https://github.com/vim/vim/blob/master/src/tag.c.
Anyway, this is about vim, and I have not got a bug report, I didn't touch this area.
I'm using ctags for reading the source code of products released by Red Hat. So I can assume the source code doesn't change after making tags file. For me, just line numbers are needed.
vim doesn't complain about the pattern fields and I just use the line number fields. So the code about pattern has been kept as is since Exuberant-crags.
~Vim users~ Vim hackers, do you have any comments?
Thanks!
the pattern is not a regex pattern
Well it should be (but I think it's not implemented correctly). See the manpage of ex:
/pat/ ?pat?:
Scan forward and backward respectively for a line containing pat, a regular expression (as defined below).
and the regex defined in this manpage already has things like .
and *
.
My experiment with EX commands both in vi
and vim
also shows /pat/
and ?pat?
are actually using regular expressions. And the manpage of ctags claims the pattern field is designed to be used like this in vi
.
Other than escaping [?/] with a backslash, the pattern is a just substring of the input source code.
The fact that there are ^
and $
already tells they are regexps, but we can see in your example (and my own experiments) that most other components of regexps are not handled well.
My guess is, as we know that the regex thing itself emerges from ed, ex and vi, maybe the ancestor program of ctags is using an old and still incomplete version of regex, and ctags inherits that without changing.
I tried to read https://github.com/vim/vim/blob/master/src/tag.c but didn't find useful information. I'm not familiar with C and the vim project.
I want to read the ctags code emitting pattern fields. If all it does is add ^
/$
and escape /
/?
, we could handle it very easily.
Here's a little note about what I found in ctags. I have to write this down while reading, since I have no experience reading C projects before. Seems it writes the pattern field by:
main/writer-ctags.c
: writeCtagsEntry
-> escapeFieldValue
-> escapeFieldValueFull
->main/field.c
: renderField
-> renderFieldCommon
-> renderFieldPattern
->main/entry.c
: makePatternString
-> makePatternStringCommon
makePatternStringCommon
is where the field value is actually generated. I discovered something here:
//
. I have to test this somehow.^
is not necessarily there, and I don't understand the criteria.$
is added only when the line is not empty and ends with a newline.\
, terminal $
, and ?
//
based on the search direction.This is the code that actually puts together the pattern field:
// `?` or `$`
length += putc_func(searchChar, output);
// I digged into this a little bit but didn't understand it. Anyway this means
// the pattern doesn't necessarily begins with a `^`.
if ((tag->boundaryInfo & BOUNDARY_START) == 0)
length += putc_func('^', output);
// Writes the line from the source file. Defined in `main/entry.c`. It quotes
// `\`, terminal `$`, and `?` or `/` based on the search direction. It also
// truncates the line, which I think happens before the escaping thing.
length += appendInputLine (putc_func, line, Option.patternLengthLimit,
output, &omitted);
// Write a `$` or nothing. I think this predicate is not needed since
// `terminator` is already assigned "$" or "" based on the situation
length += puts_func (omitted? "": terminator, output);
// `?` or `$`
length += putc_func (searchChar, output);
I implemented
--excmd=combine
as requested in https://github.com/universal-ctags/ctags/issues/1125.
@masatake Just saying. I think this is a really weird request. Why "go to a line before/after the tag and search from it", while you have --excmd=number
which gives the line number directly? This also denies the purpose of using a search pattern, which is to tolerate certain (actually a wide range of) source file updates.
Maybe it's due to some limitation on vim's side so the search pattern have to appear, but an EX command <number>
sends you to that line directly (try something like :20
in vi). Supporting this should be done by vim, not ctags.
Update: This in vim's tag.c
also doesn't seem correct to me:
if (tagp.tagline > 0)
// start search before line from "line:" field
curwin->w_cursor.lnum = tagp.tagline - 1;
else
// start search before first line
curwin->w_cursor.lnum = 0;
Search for a ?pat?
pattern from tagp.tagline - 1
should fail. And, let's assume a user read ctags(1)
about when to use which kind of pattern field, so he choose to use excmd=--pattern
for dealing with code update, but he also enables the line:
field, and vim will see it and do what's not expected by the user.
But anyway, this is their thing, not ours.
BOUNDARY_START is introduced to implemented "guest parser". See https://docs.ctags.io/en/latest/running-multi-parsers.html#running-multiple-parsers-on-an-input-file .
To run a parser from a parser for an area, you can use promise.
<html>
<script>var f = function () { ... }</script>
</html>
$ u-ctags --extras=+g -o - /tmp/x.html
f /tmp/x.html /var f = function () { ... }/;" f
Both ^
and $
are not used.
html parser runs javascript parser.
The javascript parser receives var f = function () { ... }
as input.
For the javascript parser, the line starts from var ..
. So when making a tag for the line,
the javascript parser may want to emit "^var ..." as pattern.
However, from the view point of the html parser, "^" is not needed.
BOUNDARY_START is for handle this situation.
BOUNDARY_START is introduced to implemented "guest parser".
Ah, I remember saw it in the documentation ;) I think now I understand when will ^
appear.
Btw, I found this in the unit test of u-ctags: https://github.com/universal-ctags/ctags/blob/master/Tmain/nested-subparsers.d/stdout-expected.txt. It defines a subparser to handle the event
kind using regex. I think this is really the beauty of ctags and one of the reason some people prefer it over say language servers. But seeing it now reminds me that it would be hard to offer language-specific support in Citre, that is compatible with user-defined subparsers like this.
But this is a futuer topic. I might shut up for now and work on handling search patterns.
I got an error:
I generated the tags file with
If I removed -n from the command line and regenerated the tags file. The error was gone.
I think rejecting a tag file with -n option automatically is a simple solution.