universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.39k stars 619 forks source link

ReStructuredText full qualified form #3268

Open Djedouas opened 2 years ago

Djedouas commented 2 years ago

Description of the problem

My need is to have the tagbar plugin to handle rst structure in vim.

This plugin is based on ctags, and I found that universal-ctags has a rst parser.

But I have the same problem as in this issue on their repo https://github.com/preservim/tagbar/issues/408 which is that the hierarchy is not supported:

image

Apparently the problem comes from universal-ctags which does not use full qualified form.

Beginning of a solution?

After some research I came upon this comment which is exactly about what I am looking for:

https://github.com/universal-ctags/ctags/blob/80753d68d11926608f4d998b7f83964af7540562/parsers/rst.c#L415-L425

especially I am looking for the behavior described here

https://github.com/universal-ctags/ctags/blob/80753d68d11926608f4d998b7f83964af7540562/parsers/rst.c#L423-L424

Found a weird solution

As I began to go into the code, I started debugging by commenting the call to this function to see what happens (I was expecting to see something disappear in the output tag file)

https://github.com/universal-ctags/ctags/blob/80753d68d11926608f4d998b7f83964af7540562/parsers/rst.c#L561

What a big surprise! Commenting this very line 561 outputs exactly what I need! A full qualified form as described here

https://github.com/universal-ctags/ctags/blob/80753d68d11926608f4d998b7f83964af7540562/parsers/rst.c#L423-L424

The hierarchy is now correct:

image

But maybe it is a special case...?

What to do now?

Maybe a command line option can be created or linked to this behavior? To be able to switch on/off the full qualified form? Maybe this is already the case, and despite my searches I didn't find anything about this...? Maybe it can be the default behavior?

Thanks πŸ™‚

masatake commented 2 years ago

It is related to https://github.com/universal-ctags/ctags/issues/2063 and https://github.com/universal-ctags/ctags/pull/3063

The scope field should have an FQ name or not is one of the complex topics.

I will summarise the status of the this topic. There is no option for the purpose. Adding such an option is not hard, but I wonder whether the option should be specialized to the ReStructuredText parser or not.

An ideal solution is assigning unique IDs to tags. So we can use the id for specifying a scope. I'm working to realize the ideal solution how it will take a year. Furthermore, client tools like vi must support the IDs. So we have to think about plan B that solves the issue NOW.

Djedouas commented 2 years ago

Thanks for the quick answer!

Are you saying that there is (or that it should be started) a work in progress on a possible plan B?

masatake commented 2 years ago

I'm not sure whether the usage of the words "plan *", but the next one is "plan Z". The worst solution (but easy to implement).

$ cat input.rst 
A
=================

B
-----------------

C
.................
$ ./ctags -o - input.rst 
A   input.rst   /^A$/;" H
B   input.rst   /^B$/;" h   title:A
C   input.rst   /^C$/;" c   subtitle:B
$ ./ctags --param-ReStructuredText.FQscope=true -o - input.rst 
A   input.rst   /^A$/;" H
B   input.rst   /^B$/;" h   title:A
C   input.rst   /^C$/;" c   subtitle:A""B
$ git diff |cat
diff --git a/parsers/rst.c b/parsers/rst.c
index 11d81ecf6..9f87d8a36 100644
--- a/parsers/rst.c
+++ b/parsers/rst.c
@@ -30,6 +30,7 @@
 #include "field.h"
 #include "htable.h"
 #include "debug.h"
+#include "param.h"

 /*
 *   DATA DEFINITIONS
@@ -93,6 +94,8 @@ struct olineTracker
    size_t len;
 };

+static bool FQscope = false;
+
 /*
 *   FUNCTION DEFINITIONS
 */
@@ -558,9 +561,25 @@ static void findRstTags (void)
    nestingLevelsFree(nestingLevels);

    adjustSectionKinds(section_tracker);
-   inlineScopes();
+
+   if (!FQscope)
+       inlineScopes();
+}
+
+static void rstSetFQscope (const langType language CTAGS_ATTR_UNUSED,
+                          const char *name, const char *arg)
+{
+   FQscope = paramParserBool (arg, FQscope, name, "parameter");
 }

+static parameterHandlerTable RstParameterHandlerTable [] = {
+   {
+       .name = "FQscope",
+       .desc = "the way to fill the scope field ([false] or true)",
+       .handleParameter = rstSetFQscope,
+   },
+};
+
 extern parserDefinition* RstParser (void)
 {
    static const char *const extensions [] = { "rest", "reST", "rst", NULL };
@@ -581,5 +600,10 @@ extern parserDefinition* RstParser (void)

    def->useCork = CORK_QUEUE;

+   def->parameterHandlerTable = RstParameterHandlerTable;
+   def->parameterHandlerCount = ARRAY_SIZE(RstParameterHandlerTable);
+
+   def->defaultScopeSeparator = "\"\"";
+
    return def;
 }
$ 

I wonder how the other parsers for documentations:

$ cat input.md
# A

## B

### C
$ ./ctags -o - input.md
A   input.md    /^# A$/;"   c
B   input.md    /^## B$/;"  s   chapter:A
C   input.md    /^### C$/;" S   section:A""B
$ cat input.html

<html>
  <header>
  </header>
  <body>
    <h1>A</h1>
    <h2>B</h2>
    <h3>C</h3>    
  </body>
</html>
$ ./ctags -o - input.html 
A   input.html  /^    <h1>A<\/h1>$/;"   h
B   input.html  /^    <h2>B<\/h2>$/;"   i
C   input.html  /^    <h3>C<\/h3>    $/;"   j
$ cat input.tex
\begin{document}
\section{A}
\subsection{B}
\subsubsection{C}
\end{document}
$ ./ctags -o - input.tex 
A   input.tex   /^\\section{A}$/;"  s
B   input.tex   /^\\subsection{B}$/;"   u   section:A
C   input.tex   /^\\subsubsection{C}$/;"    b   subsection:A""B

HTML parser doesn't fill the scope. Markdown and TeX fill the scope fields with fully qualified (FQ) names.

As far as reading the parser for asciidoc, it fills scope fields but does not use FQ names. Quoting from the acciidoc.c:

            /*
             * This doesn't use Cork, but in this case I think this is better,
             * because Cork would record the scopes of all parents in the chain
             * which is weird for text section identifiers, and also this is
             * what the rst.c reStructuredText parser does.
             */

Plan A is attaching unique IDs to the tag entries. The scope can be represented with the IDs.

Plan B.

  1. make parsers to use the cork API.
  2. introduce a mechanism to control the way to fill the scope fields (short for FQ) to the cork API
  3. each parser requests their own default behavior about the way to fill the field to the API (so we can avoid breaking the compatibility).
  4. introduce the parser-specific option to change the behavior of the parser. The extended API implemented in step 2 may help.

Plan B still takes a long time. I can't estimate how long time takes to achieve step 1.

As I wrote Plan Z is not good because it is too parser-specific. However, we can regard Plan Z as an in-advance implementation of step 4. of Plan B. In this point of view, the patch I pasted to this comment is acceptable.

I will re-read the past discussions.

Djedouas commented 2 years ago

OK thanks for your investigation and explanations πŸ™‚ I will use the patch in your comment for now, which is, I agree, too parser-specific. So I let you the choice of closing this issue.

masatake commented 2 years ago

I will keep this open. I already have some commits for implementing the parameters. However, I have not opened a pull request yet the documentation (man pages) for the parameters is not ready. Please, wait for a while.

joerek-von-boerek commented 5 months ago

Having a big rst-project I only wanted to ask, if there is something going on ... ? (vim / tagbar), Thanks πŸ™‚

westurner commented 3 months ago

FWIW vim-voom supports :Voom rest for ReStructuredText RST (and e.g. :Voom markdown, :Voom latex and a number of others) https://github.com/vim-scripts/VOoM

But, it doesn't move .. index: TOC entries or .. _explicit-references: that precede headings when modifying the doctree/outline; so it's great for skipping to nested headings in RST docs.