sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.6k stars 2.12k forks source link

autosectionlabel can't link to human readable text (regression) #11644

Open mwoehlke-kitware opened 1 year ago

mwoehlke-kitware commented 1 year ago

Describe the bug

Formerly, if I wrote a reference to a section that used markup in its name, I could link to the section using just its text. Generating documentation that used to work with newwer Sphinx now fails to resolve these references.

This was introduced by #4027 and can be fixed by reverting that patch. (Which breaks for sections with quotes, so clearly a more clever fix is needed.)

How to Reproduce

Probably the easiest way to reproduce this is to clone https://github.com/mwoehlke/cps and change autosectionlabel in conf.py to sphinx.ext.autosectionlabel. (A patched version that does not have the problem is being used locally.)

For a simplified version, create some project having a section "foo bar" and a link to that section. Change the section from foo bar to e.g. :code:`foo` bar. The link breaks.

Environment Information

Platform:              linux; (Linux-6.4.10-200.fc38.x86_64-x86_64-with-glibc2.37)
Python version:        3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 13.1.1 20230511 (Red Hat 13.1.1-2)])
Python implementation: CPython
Sphinx version:        5.3.0
Docutils version:      0.19
Jinja2 version:        3.0.3

Sphinx extensions

`['sphinx.ext.autosectionlabel']`

Additional context

No response

AA-Turner commented 1 year ago

The PR referenced was 5 years ago! Also note: Sphinx 5.3 is not suported, please check with 7.2.3.

A

mwoehlke-kitware commented 1 year ago

The PR referenced was 5 years ago!

Yes; it was broken a long time ago. Your point?

Sphinx 5.3 is not suported, please check with 7.2.3.

5.3 is the latest version available in a released Fedora, dating back a mere six months. Fedora is supposed to be supporting it for another six months at least.

Regardless, here is what has changed in autosectionlabel.py since 5.3.0:

diff --git a/sphinx/ext/autosectionlabel.py b/sphinx/ext/autosectionlabel.py
index 7dc9ddaec..d423fcc05 100644
--- a/sphinx/ext/autosectionlabel.py
+++ b/sphinx/ext/autosectionlabel.py
@@ -1,17 +1,22 @@
 """Allow reference sections by :ref: role using its title."""

-from typing import Any, Dict, cast
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, cast

 from docutils import nodes
-from docutils.nodes import Node

 import sphinx
-from sphinx.application import Sphinx
 from sphinx.domains.std import StandardDomain
 from sphinx.locale import __
 from sphinx.util import logging
 from sphinx.util.nodes import clean_astext

+if TYPE_CHECKING:
+    from docutils.nodes import Node
+
+    from sphinx.application import Sphinx
+
 logger = logging.getLogger(__name__)

@@ -52,7 +57,7 @@ def register_sections_as_label(app: Sphinx, document: Node) -> None:
         domain.labels[name] = docname, labelid, sectname

-def setup(app: Sphinx) -> Dict[str, Any]:
+def setup(app: Sphinx) -> dict[str, Any]:
     app.add_config_value('autosectionlabel_prefix_document', False, 'env')
     app.add_config_value('autosectionlabel_maxdepth', None, 'env')
     app.connect('doctree-read', register_sections_as_label)

Do you think there's any chance that fixed the bug?

Unlikely, but maybe something else changed to magically work around the problem? Let's find out:

/path/to/cps$ git diff
diff --git a/conf.py b/conf.py
index 19562a0..3554dd5 100644
--- a/conf.py
+++ b/conf.py
@@ -8,7 +8,7 @@ sys.path.insert(0, os.path.abspath('_ext'))

 # -- General configuration ------------------------------------------------
 needs_sphinx = '5.3'
-extensions = ['cps', 'autosectionlabel']
+extensions = ['cps', 'sphinx.ext.autosectionlabel']

 source_suffix = '.rst'
 exclude_patterns = []
/path/to/cps$ make
sphinx-build -b html  . site
Running Sphinx v7.2.3
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 12 source files that are out of date
updating environment: [new config] 12 added, 0 changed, 0 removed
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
copying assets... copying static files... done
copying extra files... done
done
writing output... [100%] searching
/path/to/cps/configurations.rst:36: WARNING: undefined label: 'configurations (package)'
/path/to/cps/configurations.rst:116: WARNING: undefined label: 'name'
/path/to/cps/configurations.rst:116: WARNING: undefined label: 'configuration'
/path/to/cps/configurations.rst:116: WARNING: undefined label: 'components (package)'
/path/to/cps/recommendations.rst:64: WARNING: undefined label: 'requires (package)'
/path/to/cps/searching.rst:22: WARNING: undefined label: 'platform'
generating indices... genindex done
writing additional pages... search done
copying images... [100%] logo-128.png
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 6 warnings.

The HTML pages are in site.

Build finished. The HTML pages are in 'site'.

(That's using pip install sphinx in a freshly created venv. Note the version reported in the build.)

Reverting to CPS's internal fork (which uses astext() rather than rawsource, effectively reverting #4591) resolves the problem.

Note that, while debugging, I also verified that domain.labels with upstream winds up with rubbish like :attribute:`requires` :applies-to:`(package)`; easily verified by sticking a print just before domain.labels[name] = ... in autosectionlabel.py. (The correct label is requires (package), as seen above in the failure to find said label.)

picnixz commented 1 year ago

By not using rawsource, how would you differentiate between Chapter 1 and Chapter **1** since both have the same astext() value? (they are treated as different labels since by default (no extension) the first one has an id assigned to chapter-1 and the second to id1).

As such, I think the behaviour should be left as is, but we could possibly change the latter with a configuration value (in which case, the user must guarantee that there are no title looking the same but having distinct rawsource value, which is a stronger requirement than what :ref: requires, namely uniqueness of labels across the project).

Technically, you should write :ref:`:code:\`foo\` bar` since it's the real title.

mwoehlke-kitware commented 1 year ago

By not using rawsource, how would you differentiate between Chapter 1 and Chapter **1** since both have the same astext() value?

Manually assign a different label to one. Or don't do that. You already noted that the default (no extension) handling treats them as having the same implicit label, and very few humans are going to think of the second as "chapter-one-with-bold-one" either.

Technically, you should write :ref::code:\foo` bar` since it's the real title.

No one wants to write that. Almost no one is going to think of the label that way. IMNSHO this is a prime example of artificial stupidity, and AFAIK no one made a deliberate decision to force users to do this. It's a regression caused by #4591 which was intended to fix a completely different bug.

It's also inconsistent with intra-document links. Within the same document, I don't write `blah <:code:\`foo\` bar>`_, I write `blah <foo bar>`_, and the HTML link is to #foo-bar. Why should autosectionlabel be an oddball?

picnixz commented 1 year ago

Well, I agree that no one wants to write it like this, but this was just an illustration of what a title is (in contrast to an id) since technically speaking the docs says that you can refer the section by their title.

Now, a PR is welcomed. I don't have access to my computer for the next weeks but I can review it. Btw, I am surprised by the fact that no one complained about this behaviour since it's been like this since Sphinx 1.7 or so.

mwoehlke-kitware commented 1 year ago

Is there a way to make autosectionlabel run before transforms?

picnixz commented 1 year ago

Why do you want to run it before transformations?

Also it is run when the doctree is read so it's at rhe writing phase? like way after the transforms iirc.

mwoehlke-kitware commented 1 year ago

Why do you want to run it before transformations?

The original purpose of the change was to use the non-transformed text; specifically, because the transformed text was more difficult to write (due to smartquotes, specifically). TBH, I think using the non-transformed text is TRTTD at least in some cases, especially as transforms might do who-knows-what to the text. However, the point of this conversation is that I disagree that requiring the user to write out the raw source of the section is what we want.

What we want is just the text, but before transforms are applied. I don't know of an obvious way to get that at the point autosectionlabel is run. Any ideas?

picnixz commented 1 year ago

Mmh I don't remember if there is an event fired containing the raw text.

If you want the text at the level of a docstring, you can perhaps handle the autodoc-process-docstrings event though you'll only get lines to process.

If you want to process something just before autosectionlabel, you can use doctree-read with a lower priority value (the lower the prio number is, the earlier it will be done). You could then record the nodes that are to be transformed by autosectionlabel in some extra environment state or in the node itself as an extra attribute.