Filter duplicate and non-existing entries from tag files

It would be very useful, if duplicate or invalid entries from tag files could 
be skipped automatically (with `:ts`, `:tj` etc).

I often have a "tags" file in a subdirectory, and another one above it, e.g. 
from running "ctags -R" not only on the top level.
Additionally I also use global tags file, which are meant to be used in 
multiple projects.

Therefore the same tags might be defined multiple times.

While it's often only listed twice (in case this happens), this is especially 
annoying, because it triggers the code path to select the entry - although both 
result in the same jump.

I am (usually) using `g<C-]>` to jump to tags (mapped to `<f2>`).

My default setting for `tags` is:

    tags=./tags;/,~/src/tags.global,~/.vimtags

Apart from consolidating duplicate entries (where only the path differs), it 
would be useful to maybe also skip invalid entries, where the path does not 
exist (anymore).

The relevant code is/starts here:
https://github.com/vim-jp/vim/blob/master/src/tag.c#L597-621

It might be rather trivial to check for this at the beginning of the loop, by 
keeping a list of already displayed tags in a new list (with the absolute 
path), and then skipping additional entries.

Here is a first shot at a patch to skip tags in non-existing files:

    diff --git i/src/tag.c w/src/tag.c
    index ba42f15..5fa9862 100644
    --- i/src/tag.c
    +++ w/src/tag.c
    @@ -618,6 +618,18 @@ do_tag(tag, type, count, forceit, verbose)
                    for (i = 0; i < num_matches && !got_int; ++i)
                    {
                        parse_match(matches[i], &tagp);
    +
    +                   /* Skip non-existing entries. */
    +                   struct stat st;
    +                   p = tag_full_fname(&tagp);
    +                   if (mch_stat((char *)p, &st) < 0) {
    +
    +                       for (k = i; k < num_matches; k++)
    +                           matches[k] = matches[k + 1];
    +                       num_matches--;
    +                       continue;
    +                   }
    +
                        if (!new_tag && (
     #if defined(FEAT_WINDOWS) && defined(FEAT_QUICKFIX)
                                    (g_do_tagpreview != 0
    @@ -770,7 +782,8 @@ do_tag(tag, type, count, forceit, verbose)
                    }
                    if (got_int)
                        got_int = FALSE;    /* only stop the listing */
    -               ask_for_selection = TRUE;
    +
    +               ask_for_selection = (type == DT_SELECT) || (num_matches > 1);
                }
     #if defined(FEAT_QUICKFIX) && defined(FEAT_EVAL)
                else if (type == DT_LTAG)

This is just a proof of concept, `tag_full_fname` get now called twice for 
existing entries.

Handling duplicate entries is more involved, and I would appreciate any 
pointers at handling this with Vim's internal data structures.
Original issue reported on code.google.com by dhahler@gmail.com on 25 Jan 2015 at 8:07
odeke-em / vim

Filter duplicate and non-existing entries from tag files #322