sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.52k stars 2.12k forks source link

Sphinx should add a way to use autoformatters #11401

Open cjdb opened 1 year ago

cjdb commented 1 year ago

Is your feature request related to a problem? Please describe. Languages often have multiple different formatting styles, and it would be good to have the code in documentation be formatted in the same style as the rest of the project. Although I haven't used it, I think reStructured Text can load source files, so code blocks can probably be formatted by using those.

To my knowledge, there isn't a way to do the same for API directives, which can create both inconsistencies and unreadable code. For the latter, consider this C++ standard library function.

.. cpp:function:: template <class T>                                                 \
                  constexpr conditional_t<not is_nothrow_constructible_v<T, T&&> and \
                                              is_constructible_v<T, T const&>,       \
                                          T const&, T&&>                             \
                  std::move_if_noexcept(T &x);

It ends up being formatted as

template<class T>
constexpr conditional_t<not is_nothrow_constructible_v<T, T&&> and is_constructible_v<T, T const&>, T const&, T&&> std::move_if_noexcept(T &x);

Unfortunately, due to my theme's apparent 80ish column limit, it ends up looking like this in my browser:

template<class T>
constexpr conditional_t<not is_nothrow_constructible_v<T, T&&> and
is_constructible_v<T, T const&>, T const&, T&&> std::move_if_noexcept(T &x);

Neither of these are particularly readable formats, and I think potential readers would be well within their rights to complain about this. Ideally, it would be rendered as it was styled in the directive.

Describe the solution you'd like It would be good if Sphinx could be pointed at a formatting tool (possibly with arguments) so that documentation authors can ensure that API directives are both consistent with project style and are legible. Possibly something like this in conf.py:

autoformat = {'path': '/path/to/clang-format', 'args': ['-style=file']}

If autoformat is None or not set, then Sphinx's internal formatting becomes the default.

Describe alternatives you've considered Formatting could alternatively follow what's prescribed in the directive verbatim, but the backslashes might prevent that from being a successful endeavour.

cjdb commented 1 year ago

Hi, would it be possible to get some input on where to start with this, please?

picnixz commented 1 year ago

It should be possible to do it by processing the post-processing the signature via autodoc-process-signature event or bx adding a transformation to change the doctuils nodes.

However, if you want to use the rst content itself, it might be challenging as mentioned here https://github.com/sphinx-doc/sphinx/pull/11011#issuecomment-1509186447. I suggested a similar functionality where you would use the RST content (or the Python code which is translated to RST by autodoc) but I think this feature is not that straightforward to implement.

My best shot is to dive into autodoc and to make it much more modularisable than it is currently.

An alternative is to implement a similar logic to what I've done for PEP 695 but for C++ so that templates are properly formatted.

cjdb commented 1 year ago

It should be possible to do it by processing the post-processing the signature via autodoc-process-signature event or bx adding a transformation to change the doctuils nodes.

My original approach (way back when I made this bug) was to create a post-processor that stripped the HTML tags, ran the source through the tool of choice, and then re-applied the HTML tags. I don't recall what went wrong here, but after a few days of playing around, I concluded that my approach was going to be painful. Is the autodoc-process-signature approach similar? The internal representation probably doesn't need to be touched, but the interesting thing to manipulate here are the spaces, which don't have HTML tags around them.

However, if you want to use the rst content itself, it might be challenging as mentioned here https://github.com/sphinx-doc/sphinx/pull/11011#issuecomment-1509186447. I suggested a similar functionality where you would use the RST content (or the Python code which is translated to RST by autodoc) but I think this feature is not that straightforward to implement.

My best shot is to dive into autodoc and to make it much more modularisable than it is currently.

I think this is the best approach in the long-term, since it potentially allows for more than just autoformatters. How difficult would this option be? It sounds like a months-long endeavour, especially for someone who isn't at all familiar with the project.

An alternative is to implement a similar logic to what I've done for PEP 695 but for C++ so that templates are properly formatted.

This problem isn't limited to templates: it's just where I've surfaced the problem. It would be best if Sphinx just deferred to dedicated tooling (when available) so that projects don't need to have multiple config files with conflicting opinions on how to "properly" format. Ideally, the declarations in the documentation would be formatted the same way as in source.

This is especially relevant for C, C++, and possibly Objective C---which unlike more modern languages---don't have a canonical formatting style. The clang-format style options page gives a taste of what kinds of things different people prefer to have in their respective codebases. Many C, C++, and Objective C programmers defer formatting to clang-format, though a colleague informed me that emacs and vi both have highly-customisable autoformatters as well. Even Python has various autoformatters, so being able to plug those in would be useful.

picnixz commented 1 year ago

My original approach (way back when I made this bug) was to create a post-processor that stripped the HTML tags, ran the source through the tool of choice, and then re-applied the HTML tags

Not the best approach in general. The internal HTML tree might be really messy and hard to parse because of how the docutils tree is generated.

Is the autodoc-process-signature approach similar?

Sorry, it's not the correct event. This one only changes the signature string. I think only the ObjectDescription.handle_signature method could be of use here (because it's the method responsible for transforming a string into docutils nodes). Note that the formatting is done by a Writer class (which is essentially a node visitor) and the latter is responsible for writing the correct HTML/LaTeX/... output. You can technically change the docutils nodes using a SphinxTransform. At that point you could maybe apply some formatter but it's not guaranteed that the latter works, e.g., in Python, the nodes don't contain some def and may contain illegal Python statements when converted into plain text (e.g., `foo(x = <object object>)), so when you would apply the formatter on an invalid string and it would not help at all.

How difficult would this option be?

Hard. Note that autodoc only works for Python code since we are importing objects. Since spaces and line breaks are important, we'll need for instance to come up with something that keep them (AST/tokenizer gobble them since they are not relevant outside of strings). I have some ideas but I am not sure they would work so I would say it's a non-trivial task. Perhaps for an "already written" rst document, it should be fine but I'm not sure about this as well. Because we somewhat need to extend the docutils parser in order to remember the lines and columns of the corresponding object and use them when creating the docutils nodes.


Personally, I think that the formatter should be applied before anything and not during the Sphinx process because Sphinx uses a lot of internal stuff which cannot be "reformatted". More precisely, the formatter should be applied on an RST text to produce another RST text. For instance with autodoc, you "generate" some RST content and then you parse it as a normal RST content (as if the latter was the content of your document). So you would like call your formatter whenever you write some RST function declaration. However, I don't have a clean way to do it yet. There are a lot of interrogations such as how to keep indentations and line breaks without breaking the RST parser.

electric-coder commented 10 months ago

@cjdb this is a non-trivial problem. E.g. I sometimes extract code to include in the docs as a literal example, it's necessary to apply a 3rd party code formatting tool in conf.py using a custom extension to adjust indentation from the originating context to the docs. As you say this is "language specific" and in order to be integrated would require Sphinx core to interoperate with 3rd party tools, thereby creating a dependency of the Sphinx project on at least one styling library per language (for this reason alone any such tool should be an extension before it's integrated into Sphinx).

However, the problem you're describing is even worst because it requires adjusting how produced signature styles are rendered by Sphinx in HTML or other outputs (a 1 language to N output formats relation taken over M languages). Again, the requirement sounds simple but any of the style-formatting tools tend to be by themselves extremely complicated (think of python-black for example), now imagine the complexity of custom tailoring HTML to every one of the existing styles and output formats... I can't think of any tool out there (free or paid) that currently fulfills all those requirements.

cjdb commented 10 months ago

thereby creating a dependency of the Sphinx project on at least one styling library per language (for this reason alone any such tool should be an extension before it's integrated into Sphinx)

Sphinx shouldn't need to even be aware of the formatters, let alone depend on any. The only work it's doing is invoking an external tool (and passing on any output).

However, the problem you're describing is even worst because it requires adjusting how produced signature styles are rendered by Sphinx in HTML or other outputs (a 1 language to N output formats relation taken over M languages). Again, the requirement sounds simple but any of the style-formatting tools tend to be by themselves extremely complicated (think of python-black for example), now imagine the complexity of custom tailoring HTML to every one of the existing styles and output formats... I can't think of any tool out there (free or paid) that currently fulfills all those requirements.

I don't quite understand what you mean here. Formatting typically adjusts spacing between tokens, and Sphinx isn't doing any of the work: it's asking a third-party tool to do the work. Sphinx is already able to render pre-formatted code, so it's not clear to me why this would be more difficult than what Sphinx is already capable of doing.

electric-coder commented 10 months ago

Sphinx shouldn't need to even be aware of the formatters, let alone depend on any. The only work it's doing is invoking an external tool (and passing on any output).

OK, so how do you tell Sphinx which formatter to choose and where do you specify it? Answer that question and we'll see the dependency pop-up.

(We're also lost in translation here, I don't use C++ but the example you gave seems like the python-black equivalent of wrapping a signature for length. )

Unfortunately, due to my theme's apparent 80ish column limit

The theme is yet another level of presentation, if a theme doesn't by itself offer adjustable line-length Sphinx has no way of superseding the theme's presentation choices. (My solution is carefully choosing a theme that adjusts itself to my needs.)

cjdb commented 10 months ago

OK, so how do you tell Sphinx which formatter to choose and where do you specify it? Answer that question and we'll see the dependency pop-up.

I suggested that users provide a path to the formatter and any command-line arguments. That is a dependency that the user needs to manage, but Sphinx doesn't have any code dependencies (I'm not suggesting that any formatters be integrated into Sphinx directly). It should be possible to achieve this part using only subprocess.

(We're also lost in translation here, I don't use C++ but the example you gave seems like the python-black equivalent of wrapping a signature for length. )

Right, C++ has very few restrictions on spaces. Typical lines range from 72-120 columns, depending on the project, but there isn't an agreed-upon limit.

The theme is yet another level of presentation, if a theme doesn't by itself offer adjustable line-length Sphinx has no way of superseding the theme's presentation choices. (My solution is carefully choosing a theme that adjusts itself to my needs.)

Agreed.

electric-coder commented 10 months ago

@cjdb

I suggested that (...)

I get a feeling you don't fathom how complicated this really is. Take a look at the Furo theme's customization-injecting code "unstable" disclamer.

I think you fail to understand that what you're proposing (as picnix explained) requires not only changes to Sphinx but also a theme that can accommodate those change. I tried customizing some basic stuff with the Furo theme and I found that some CSS classes were simply missing, so with certain reST constructs after some depth the theme just didn't provide adequate CSS selectors that you could target...

What you're proposing is even more radical, because Sphinx doesn't know how to break signatures into several lines and each line would need to have it's own CSS selectors carefully organized for every reST construct -let me stress that: every C++/reST construct- (and afterwards there'd need to be a theme targeting all those selectors with an adequate styling).

So just saying: "hey lets apply a code styling tool" doesn't even begin to address the dual issue of changing the Sphinx output structure and finding a theme whose devs are willing to accommodate those changes.

Sphinx shouldn't need to even be aware of the formatters, let alone depend on any.

That's the problem, when line breaking a signature in some step of the build process you'll need to change the <div> and <p> structures and insert meaningful CSS selectors that are coherent as a whole (I don't see how you'd not change the HTML tag structure when line breaking so many different signatures). And afterwards a theme needs to style it. (Every argument you've made thus far feels like: "there's some text lets put <br>s into it.")

electric-coder commented 10 months ago

@cjdb Manually change the HTML and CSS your Sphinx is currently outputing for this single signature and let us see the nature of the changes that are necessary.

.. cpp:function:: template <class T>                                                 \
                  constexpr conditional_t<not is_nothrow_constructible_v<T, T&&> and \
                                              is_constructible_v<T, T const&>,       \
                                          T const&, T&&>                             \
                  std::move_if_noexcept(T &x);

but the interesting thing to manipulate here are the spaces, which don't have HTML tags around them.

So you're saying just inserting <br>s without HTML nor CSS can line break complex signatures adequately for presentation.