Better default filtering/sorting behavior

AmaiKinono commented 3 years ago

It's meaningful to have good default behavior on filtering/sorting, especially before we invest enough time on language-specific support.

[x] Smarter auto case fold. Underscores in the symbol should make it case sensitive.
[x] Exclude anonymous/qualified tags (see extras: field). Note: I decided to keep qualified tags. They can be useful.
[x] Exclude tags that have file scope, and is not in this file.
[x] Exclude reference tags for auto-completion.
[x] Exclude file tags.

We also want to provide a second-stage filtering mechanism for the user, so they can say "oh I only want tags in this file / in some language". Currently I don't have any idea on the design, and I consider it a task after v0.1.

Better default sorting behavior:

[x] For finding definitions, put reference tags below definition tags.

I thought about this:

For auto-completion, only keep tags that's in the same language.

But I decided not to do it.

One reason is there are some common multi-language programming patterns, like C/C++ (C headers are considered as C++ file), and HTML/CSS/JavaScript. If we only keep tags that's in the same language, we'll have to immediately offer language support for them.

Another reason is users may have their own sub-parser and its generated tags should be used (see https://github.com/universal-ctags/ctags/issues/2557#issuecomment-634789712, though that's not a very good example). If we only keep tags that's in the same language, we'll have to offer user options for these use cases.

I also thought of sorting them above others, but that's also not a good idea. It's not good when the user wants a symbol in another language, and they go through the candidates from a to z and find it's below that "z".

I'd like to wait till we implement the (manual) second-stage filtering, then pick a language in the results is easy and the user has full control.

Here's a study of $ ctags --list-* outputs. Language-common things that we could make use of are:

extras:
- [x] anonymous: Non-named objects (exclude).
- [x] fileScope: Tags of file scope (exclude if not in current file).
- [x] inputFile: File tags (exclude).
- [ ] qualified: Not very useful, but maybe good to keep. I want to exclude qualified tags whose "namespace" is an anonymous tag, but seems it can't be done.
- [x] reference: Exclude for auto-completion. For finding definitions, we can put them below others and give them a special annotation so people know they are reference tags.
fields:
- [x] extras: For the "extras" info above.
- [ ] access: Access (private/public) of class members. For now I don't have good idea on it.
- [x] file: Only file tags have this field. Exclude them.
kinds:
- [x] KInds are language-specific, other than the file/F kind for file tags. Exclude them.

masatake commented 3 years ago

About sorting, I think this is very related to "sgatc" that we discussed in private mails. There is no progress in "sgatc". However, I would like to write about the current status of "sgatc" in my mind. If you don't remember "sgatc", I would like you to see https://github.com/universal-ctags/issues-we-will-not-fix-in-soon/issues/3 .

Think about the following C code (input-1.c):

1: {
2:       fu@nc();

If the point is at @, and the user presses \M-., we can use a sort rule (if (eq? $kind "function") -1 1) when running readtags.

Think about the following C code (input-2.c):

1:  int i = X->Y->MEM@BER;

If the point is at @, and the user presses \M-., we can use a sort rule (if (eq? $kind "member") -1 1) when running readtags.

These sorting rules can be made from the point (line and column) and language-specific knowledge.

Let's call the code for generating the sorting rule "rule generator".

The question is where the rule generator is implemented. In-Emacs is a natural choice for you.

I'm thinking about implementing it in readtags itself because it is client-tool neural. (context input-file line-number column-in-bytes) is an operator for utilizing the rule generator in readtags.

$ readtags -S '(context "input-0.c" 2 4)' func

This will be converted to

$ readtags -S '(if (eq? $kind "function") -1 1)' fuc

internally. As usual, this is just an idea.

AmaiKinono commented 3 years ago

I remember "sgatc". I'd say I was kind of into the idea before, but now I feel the right way is not try to be smart, and let the user do filtering.

The reason is that: 1. ctags is hackable, so there are user-defined kinds that the client tool doesn't know, and 2. ctags is a cross-language tool, and it's hard for a client tool to specifically support jumping from a language to another.

A detailed version of this argument is written here. I may also put it in the user manual later.

I can see that having language-specific sorting rules is not harmful. After all it's not filtering, so we still have access to all the tags. But I think its benefit is not significant compared to second-stage (manual) filtering.

AmaiKinono commented 3 years ago

The question is where the rule generator is implemented. In-Emacs is a natural choice for you.

I'm thinking about implementing it in readtags itself because it is client-tool neural.

Let's see. If people like the idea, maybe we'll receive many language-specific sort rules for Citre. Then we can decide if we want to move those to readtags. The problem is hacking in Elisp is easier than C, and that may hold back people from contributing to readtags.

$ readtags -S '(if (eq? $kind "function") -1 1)' func

Neat technique :) I have the idea of put tags in current file on top of the results. I was thinking of implementing it using Elisp sorter, but this technique could do the thing.

AmaiKinono commented 3 years ago

@masatake I have some questions on the technique you showed. With a sorter expression like:

(if (eq? $kind "function") -1 1)

We do have function tags on top, but when it compare two function tags, which one is above the other is uncertain. Is this right? If it is, we need to write more complicated expression.

Suppose I want to put tags with file kind above others, then sort them by the length of the name, respectively. I end up with a sorter like this:

(<or> (if (and (eq? $kind "file") (not (eq? &kind "file")))
          -1 (if (and (not (eq? $kind "file")) (eq? &kind "file"))
                 1 0))
      (<> (length $name) (length &name)))

Is there any way to simplify this?

Update: I think this is prettier:

(<or> (<> (if (eq? $kind "file") -1 1)
          (if (eq? &kind "file") -1 1))
      (<> (length $name) (length &name)))

AmaiKinono commented 3 years ago

I'm gonna close this since most of the items are implemented, and I can't come up with a better scheme for now.

universal-ctags / citre

Better default filtering/sorting behavior #42