sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.61k stars 2.13k forks source link

`objects.inv` is not generated deterministically when there are duplicate references #12001

Open raboof opened 9 months ago

raboof commented 9 months ago

Describe the bug

When the same sections are found in multiple files (like e.g. in https://gitlab.com/qemu-project/qemu/-/issues/2190), it is not deterministic which reference gets included into objects.inv .

How to Reproduce

Unfortunately I've not been able to trigger this problem in a minimal example yet. In theory it should be nondeterministic with an empty conf.py, and:

index.rst:

.. include:: toinclude.rst.inc

index.rst:

.. include:: toinclude.rst.inc

toinclude.rst.inc

foo
---

bar

... but I haven't seen it produce different objects.inv indexes with this minimal example, so there might be more going on.

Environment Information

Platform:              linux; (Linux-6.7.4-x86_64-with-glibc2.38)
Python version:        3.11.7 (main, Dec  4 2023, 18:10:11) [GCC 13.2.0])
Python implementation: CPython
Sphinx version:        7.2.6
Docutils version:      0.20.1
Jinja2 version:        3.1.3
Pygments version:      2.17.2

Sphinx extensions

No response

Additional context

https://github.com/sphinx-doc/sphinx/blob/bc74a6223caa72c39b8ccad3f17202dcb098c918/sphinx/domains/python.py#L1546C23-L1553 might be relevant

picnixz commented 9 months ago

Quick comment: it's probably because of parallel read/merge and the fact that the files could possibly be discovered in a nondeterministic way. I can investigate this tomorrow.

picnixz commented 9 months ago

I can't reproduce this one, even with a somewhat smaller QEMU docs. I think the issue might comes from the fact that the QEMU docs has a lot of internal extensions and maybe some of the mess things up. Also, it appears that there is only one label in intersphinx being created for this driver title (you can inspect the inventory using python -m sphinx.ext.intersphinx FILE_OR_URL).

project.zip

For now, I'll close the issue until you find a MWE (otherwise this issue will likely be opened for years).

jayaddison commented 9 months ago

A theory and a suggestion:

I think that a likely cause of this is variance in the order that source documentation files are read from the filesystem during Sphinx project build. That could explain why it's tricky to replicate on a single machine/filesystem -- because in isolation, that filesystem may return results in fairly-or-entirely deterministic order -- and also could mean that it's tricky to write a traditional unit test case for this, because uncovering the problem would be reliant on behaviour outside of the Sphinx codebase.

I'm wondering whether to commence work on a continuous integration test to attempt to smoke this out. If I did -- this is the suggestion part -- I'd probably begin by adding disorderfs to the GitHub Actions unit test workflows - disorderfs is a userspace-filesystem that can return filesystem results in randomized order, and is available as a Debian package.

jayaddison commented 1 month ago

Based on recent build reproducibility test results, I believe this bug remains valid - I'll try to confirm that with a minimal example soon; from the linked QEMU bugreport it seems that inclusion/re-use of definitions within multiple pages may be a contributing factor. I think this is separate to the table-of-contents ordering ambiguity tracked under recent investigation in #6714.