sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.45k stars 2.1k forks source link

C++domain: Idiomatic spacing of ptr-operator and ref-qualifier #7491

Open jbab opened 4 years ago

jbab commented 4 years ago

Problem

When declaring objects (e,g, types, members, functions) in the cpp domain, sphinx accepts arbitrary spacing between types, cv-qualifiers and ref-qualifiers. It then produces output in a consistent way (which is a plus) but which cannot be influenced by the user (which is a problem).

For example

cpp:function:: int& foo(const Type & value, Resource * resource)

leads to output

int &foo(const Type &value, Resource *resource)

The problem here is that the spacing chosen by sphinx is not the idiomatic way used by the C++ community (I am aware that this is a bold statement, but https://en.cppreference.com and https://github.com/cplusplus/draft are good support for this.)

An idiomatic spacing for the example above should look somewhat like

int& foo(const Type& value, Resource* resource)

This is important, since IMO documentation should look like the code we write and it should follow conventions which are common in the specific language.

Supporting this kind of spacing could lead more C++ developers to consider using sphinx for documentation.

Possible solutions

  1. All spacing added by the user in the sphinx declarations are transfered as is to the output.
  2. A new option for the C++ domain, say cpp_idiomatic_spacing is added which produces the desired output. The default leaves the output as is.
  3. Sphinx only generates the "idiomatic" output and nothing else.

Option 1

This probably requires huge changes to the output generating functions in cpp.py. It would make the output depend entirely on the user input. This can be seen as an advantage (full control for the user), but also as a disadvantage (potential for inconsistent output).

Option 2

Feasible, but would lead to lots of branching and duplication in the code and the tests. The question is whether someone would still want the default behavior once the option is available and if the effort maintaining two output variants is worth while.

Option 3

Implementable with relatively low effort.

Since the current choice of spacing by sphinx is somewhat arbitrary (or taken over from Python or C), a different spacing should be just as good. I am not sure, however, if this would still be considered a breaking change.

Suggested solution

I would appreciate feedback to the above suggestions. Personally I favor option 3 and would be willing to contribute a PR.

jakobandersen commented 4 years ago

The problem here is that the spacing chosen by sphinx is not the idiomatic way used by the C++ community (I am aware that this is a bold statement, but https://en.cppreference.com and https://github.com/cplusplus/draft are good support for this.)

It is indeed a bold statement which I (perhaps not surprisingly) do not agree with :-). This is perhaps a semi-religious issue like east-vs-west const, so I fully agree that it should be up to the user to decide.

  1. All spacing added by the user in the sphinx declarations are transfered as is to the output.

I consider this practically impossible to implement.

  1. A new option for the C++ domain, say cpp_idiomatic_spacing is added which produces the desired output. The default leaves the output as is.

Sounds good to me, though the option should have a less contentious name :-). From a technical perspective the current spacing is grammar-centric (the declarators (e.g., * and &) are grouped with the declarator-name), while the other style groups the declarators with the decalration-specifiers. How about the slightly long-winded cpp_grammar_centric_declarator_spacing = True?

  1. Sphinx only generates the "idiomatic" output and nothing else.

As I object to "idiomatic" I am definitely against this :-).

Since the current choice of spacing by sphinx is somewhat arbitrary (or taken over from Python or C), a different spacing should be just as good. I am not sure, however, if this would still be considered a breaking change.

As mentioned, the current style is based on how the C and C++ grammars are written, though in all fairness, biased towards my personal formatting opinion in more complicated cases involving parameter packs.

I don't mind implementing option 2 at some point, but what would very helpful would be a set of test cases, with the result in both styles. The more complicated issues comes with multiple qualified pointers, parameters packs, and differentiation of whether there is a declarator-name or not (e.g., function declarations without parameter names).

jbab commented 4 years ago

First of all, thank you very much for your comments and your appreciation of my request. I certainly do not want to rage any pseudo-religious wars. But I am also convinced that the issue is not of the same character as east-const vs. west-const. My issue is that in 20+ years of experience with C++ I cannot remember having encountered a serious book, article, talk, co-worker etc. which/who uses the grammar_centric style you are proposing.

I would be curious to learn about the cases involving parameter packs or other cases you are having in mind where this style may be of advantage.

I could also be interesting to collect more opinions about this, maybe a poll among users?

Of course I could live with option 2 (and I do not insist on a particular name). I am just afraid that all the effort of implementing and maintaining it will be futile since hardly anyone would be using one of the two styles 😉.

jbab commented 4 years ago

http://www.stroustrup.com/bs_faq2.html#whitespace

vector-of-bool commented 4 years ago

I decided to cut my teeth on Sphinx, and I like having this option, so I opened PR #7507 that allows customization of the placement of the ptr/ref token via cpp_ref_placement and cpp_ptr_placement. (Exact details can be bikeshed, of course.)

jakobandersen commented 4 years ago

@vector-of-bool, thanks for your work in #7504. Before going into the actual implementation I tried to get an overview of the situation. There are quite a few cases of declarator'ish things where the spacing style probably should be applied. For each case there are two situations: named and unnamed. The unnamed case appears for example in function qualifiers, trailing return types, template arguments, and function argument types without a parameter name.

A Single Declarator

Left

I personally don't prefer this style so there are a few cases where it's not clear to me what spacing people use. Do you agree with the table?

Declarator Named Unnamed
Ptr. A* a | A*
Ptr., qual. A*const a (?) | A*const (?)
L-val. ref. A& a | A&
R-val. ref. A&& a | A&&
Param. pack A... a | A...
Func. ptr. A (*a)(B arg) | A (*)(B arg)
Ptr. to mem. A C::* a | A C::*
Ptr. to mem., qual. A C::*const a (?) | A C::*const (?)
Ptr. to mem. func. A (C::* a)(B arg) | A (C::*)(B arg)
Ptr. to mem. func., qual. A (C::*const a)(B arg) (?) | A (C::*const)(B arg) (?)

Right (current Sphinx)

While this is my preferred style, it looks like some of the cases are not rendered as I usually write code.

Declarator Named Unnamed
Ptr. A *a | A*
Ptr., qual. A *const a | A*const
L-val. ref. A &a | A&
R-val. ref. A &&a | A&&
Param. pack A ...a (1) | A...
Func. ptr. A (*a)(B arg) | A (*)(B arg)
Ptr. to mem. A C::* a (2) | A C::*
Ptr. to mem., qual. A C::*const a | A C::*const
Ptr. to mem. func. A (C::* a)(B arg) (3) | A (C::*)(B arg)
Ptr. to mem. func., qual. A (C::*const a)(B arg) | A (C::*const)(B arg)

Combinations of Declarators

Note all of these seem to be trivial extensions of the single-declarator cases. I have a few example tables for the "right" style, but we should probably agree on the single-declarator cases first.

Other Wonderings

jbab commented 4 years ago

About your table Left

In all rows with (?) the const refers to the variable being declared, i.e. a is const. The const should therefore be separate from the type (pointer to A or to a member of C):

Declarator Named
Ptr., qual. A* const a
Ptr. to mem., qual. A C::* const a
Ptr. to mem. func., qual. A (C::* const a)(B arg)

In the unnamed case, just leave out the name and its preceding space.

There are more cases involving cv-qualification: The object pointed to can be const/volatile, the member function can be cv-qualified. All of these should not be affected by the question on how to place the *.

Another thing of interest for this discussion is the placement for member function ref qualifiers. For the left style I suggest:

A f() &;
A f() &&;
A f() const&;
A f() const&&;

About your table Right

Again, I would add a space between the const (which refers to the variable) and the type, only now the * is attached const a instead of the "type" (which is not quite correct, since in C++ the type is pointer to something) . Thus

A C:: *a
A C:: *const a
A (C:: *a)(B arg)
A (C:: *const a)(B arg)

This would be totally in line with A *a and A *const a. Unnamed would just mean to leave out the name. The results looks awkward, especially for the unnamed and the pointer to member cases. Well, this may be the reason why this is a rather unusal style in C++.

A function returning a reference is currently rendered by Sphinx as

A &f();

Thus I think that the unnamed reference cases in your table should be

A &
A &&

For ref-qualified member functions in the right style I suggest:

A f() &;
A f() &&;
A f() const &;
A f() const &&;

(1) I agree. What is the current behaviour in the template declaration?

template<typename... Args> ...

or

template<typename ...Args> ...

About your other wonderings