[docs/extdev] explain how to programatically create new documents

Background

As proposed by the GitHub issue selector, I asked a question on StackOverflow. But still my question is not answered and/or got less attention by people with deep Sphinx knowledge. I also tried to research the Sphinx code, its builtin directives/domains and other famos extensions like Carlos Jenkins autoapi.

I'm currently writing an extension (https://github.com/pyTooling/sphinx-reports) to show various reports:

documentation coverage (from DocStrCoverage via API)
code coverage (from Coverage.py via JSON file)
unittest results (from pytest via XML file)
dependencies (planned)

In this question, I'm targeting code coverage from Coverage.py. Compared to Coverage.py's own HTML export capabilities as a standalone HTML page/directory, I would like to fully integrate the coverage report via docutils intermediate document representation. Embedding the standalone HTML output into a e.g. GitHub Pages documentation is already possible, but not a holistic solution.

The goal is to render the summary tables and colored code files via Sphinx/docutils, so all Sphinx builders can export code coverage as a chapter or appendix in the documentation (HTML, LaTeX/PDF, ...).

What I currently have is a configuration entry in conf.py:

report_codecov_packages = {
  "src": {
    "name":        "sphinx_reports",
    "json_report": "../report/coverage/coverage.json", 
    "fail_below":  80,
    "levels":      "default"
  }
}

... and this ReST code:

Code Coverage Report
####################

Code coverage report generated with `pytest <https://github.com/pytest-dev/pytest>`__ and `Coverage.py <https://github.com/nedbat/coveragepy/tree/master>`__.

.. report:code-coverage-legend::
   :packageid: src

.. report:code-coverage::
   :packageid: src

A user can define multiple packages and the associated coverage file (here in JSON format). Then in ReST, that package/dictionary entry is referenced by src. The result can be seen here: https://pytooling.github.io/sphinx-reports/coverage/index.html

In a next step, I would like to create on sub-document per source file and link them in the code coverage summary table.

Question

How to create new docutil documents in memory and link them to the document tree (navigation bar) as well to the module names in the table.

I found how to create a new document via docutils:

@export
class CodeCoverage(CodeCoverageBase):
  """
  This directive will be replaced by a table representing code coverage.
  """
  directiveName: str = "code-coverage"

  has_content = False
  required_arguments = 0
  optional_arguments = 2

  option_spec = CodeCoverageBase.option_spec | {
    "no-branch-coverage": flag
  }

  _noBranchCoverage: bool
  _packageName:      str
  _jsonReport:       Path
  _failBelow:        float
  _coverage:         PackageCoverage

  def _CreatePages(self) -> None:
    def handlePackage(package: PackageCoverage) -> None:
      for pack in package._packages.values():
        if handlePackage(pack):
          return True

      for module in package._modules.values():
        if handleModule(module):
          return True

    def handleModule(module: ModuleCoverage) -> None:
      doc = new_document("dummy")

      rootSection = nodes.section(ids=["foo"])
      doc += rootSection

      title = nodes.title(text=f"{module.Name}")
      rootSection += title
      rootSection += nodes.paragraph(text="some text")

      docname = f"coverage/{module.Name}"
      self.env.titles[docname] = title
      self.env.longtitles[docname] = title

      return True

    handlePackage(self._coverage)

  def run(self) -> List[nodes.Node]:
    self._CheckOptions()

    # Assemble a list of Python source files
    analyzer = Analyzer(self._packageName, self._jsonReport)
    self._coverage = analyzer.Convert()

    self._CreatePages()

    container = nodes.container()
    container += self._GenerateCoverageTable()

    return [container]

For me it's not clear how to add the document to all appropriate dictionaries. I also found no helper function to register a document into a document tree hierarchy level.

Alternatives I have considered:

An additional directive like module-coverage is needed, so the directive is inserting the colored code-coverage. It needs the module name and package id as parameters:
```
.. report:module-coverage::
  :packageid: src
  :module: sphinx_reports.CodeCoverage
```
The drawback is manually adding directives into ReST code. On the other hand, it allows for more control of the document (headline, header text, ...).
Carlos Jenkins autoapi runs at config-inited event and creates multiple *.rst files into an output directory. Then Sphinx discovers the ReST files as inputs and integrates them as normal.
The created file is generated using Jinja. Similar to (1), an additional directive is needed. The document styling can be influenced by the Jinja template.

Environment Information

Python: 3.9..3.12  
Sphinx: latest 7.2

Sphinx extensions

sphinx_reports

What you essentially want is to take whatever the coverage report gives you and integrate into any of the other builders. From what I understand, you have the following:

You run the coverage builder normally.
You get a bunch of report in JSON files. Those are the files you want to include in, say, the HTML build.
Now, you run the HTML build, saying "hey I want my reports here" with your custom directive.
Your HTML output has the included coverage report but only this.

Now what you want is a somewhat 'main table' which contains a summary of the reports you had and a link to the reports? (or something similar). For this, I'd suggest having a look at the todo extension tutorial which does the following:

A todo directive, containing some content that is marked with “TODO” and only shown in the output if a new config value is set. Todo entries should not be in the output by default.
A todolist directive that creates a list of all todo entries throughout the documentation.

In your case, the first point is the inclusion of a single report and the second point is what you want to achieve.

Since the issue is half a FR (I don't think our docs actually tell you how to achieve what you want to do exactly and maybe we should improve our docs for that one or say that it's not meant to be part of the public API), I'll keep it as a 'doc' issue (many advanced questions can actually be turned into a doc issue since this mainly reflects the lack of an explanation on our part).

For the mentioned steps:

Yes, I run it normally and it creates a SQLite database.
Multiple runs can create multiple SQLite files, which can be merged.
Then I let it emit JSON and XML (Cobertura) files for my postprocessing.
I get a single file containing a list of files with summary information as well as a list of line numbers which are covered or uncovered. This is usually the input to write custom coloring rules for source code.
I have not yet raised a question how to color the code with background colors similar to pygments and literal_block and the highlighted lines feature. I need 3 colors :(
Yes, that's my current approach, so the user specified where to add the table and/or colored code files.
I don't get this one.
I want to have handwritten documentation + auto-generated (autosummary, autoapi, etc) + code coverage in the appendix.

I think I way passed that simple example. Please have a look at the linked sphinx-reports repository and the generated outputs at https://pytooling.github.io/sphinx-reports/coverage/index.html I already have that table generated from JSON files. I also somehow got entries into the navigation (taken from toctree directive).
BUT it requires manually created *.rst files for now. I would like to create these documents in memory.

The code I provided in my question also shows how a docutils document is created, but it's unclaer where to link it into Sphinx data structures as there seems to be no helper function for that usecase. In the Set document title with custom sphinx parser StaackOverflow question, lots of dictionaries are modified, but it's unclear why and how.

I don't think our docs actually tell you how to achieve what you want to do exactly and maybe we should improve our docs for that one or say that it's not meant to be part of the public API.

But why is it not public API? Any think needed to write an extension is essential to extension writers like me. What extension could we write if we don't have access to it?

Anyhow. I feel the problem is also, that new documents are at the boundary between docutils (single document) vs. Sphinx (multiple documents), right?

Ah sorry, but ignore my 4th point. I started a sentence and forgot to remove it!

Please have a look at the linked

Sorry I don't have time for looking at that.

But why is it not public API

I don't know whether we intenteded or not to make it public, not that it's not meant to be public for sure.

BUT it requires manually created *.rst files for now. I would like to create these documents in memory.

Do you want the user not to write anything at all? like, they would only say "put the coverage in file XYZ" ? If this is the case, you should just add a transformation instead of a directive where you would inject the generated nodes only if the document is file XYZ. The rest will be handled by Sphinx.

IIRC, there is no way to create standalone documents and inject them without writing them on the disk. You can create 'partial' blocks (i.e., nodes) and inject those nodes in a larger document but every document must be stored for incremental builds. So if you want to programatically create your document, it's better to consider it as a 'partial tree' that will be attached to some real document (e.g., the appendix or any real RST page).

Now, there is an alternative which consists in changing the Writer class for whatever builder you are using. What you would do is essentially the same as what we do for the index and the search page for HTML builds. Those are not documents per se but are generated during the write phase on the fly.

dictionaries are modified, but it's unclear why and how.

The dictionaries being modified in your linked post are the dictionaries that are responsible for the "global" ToC and they only contain information if it's user-defined information and not auto-generated one. By the way, you don't need to bother adding titles if you create title nodes because they are collected automatically by the TitleCollector whenever a document is being processed.

Do you want the user not to write anything at all? like, they would only say "put the coverage in file XYZ" ?

I would like to limit the effort needed by the user to a single directive call per package he wants to be summarized. This creates a summary table and source code documents with green/yellow/red background color for covered/partially covered/uncovered code lines.

With this approach, the user doesn't need to adjust the documentation because he adds/removes/renames modules in this project. So my idea is to create one docutil document per Python module, which contain a headline and at least one big literal_block for code background highlighting.

I want to use one docutils document per Python module, because this gives me a file/URL per module in the HTML builder output.

IIRC, there is no way to create standalone documents and inject them without writing them on the disk.

I don't think so. At one, there is nodes.document to create a standalone document in memory. At second, when Sphinx reads rst and md files, it creates the whole documentation as an in-memory model constructed of Node instances. When Sphinx can do it recursively by parsing rst files, it can be done by code too. The question is just how to register it and where.

...but every document must be stored for incremental builds.

Can't I disable incremental builds? For code coverage, at least in this scenario, the rst code might not change, but the coverage JSON content might be different, thus the code coloring rules change.

(Sphinx has either way cache invalidation problems: When a new navigation item is added or changed, pages are not overridden leading to broken navigation bars.)

What you would do is essentially the same as what we do for the index and the search page for HTML builds. Those are not documents per se but are generated during the write phase on the fly.

I'll check that and compare pros and cons.

By the way, you don't need to bother adding titles if you create title nodes because they are collected automatically by the TitleCollector whenever a document is being processed.

Thanks.

sphinx-doc / sphinx