[bug] `sphinx.config.is_serializable` is not safe against circular references

munozejm commented 11 months ago

Describe the bug

Sphinx Version: 7.1.2

We are running into a RecursionError that impedes Sphinx's ability to process documentation content and generate output. That error is encountered during execution of the is_serializable function defined in sphinx/config.py.

Given that our documentation suite is very large and that each of our documents can be hundreds of pages long, we require flexibility in defining our documents' contents. We employ a combination of .yml (key:value pairs) and .j2 (content templates) files to build the content that is fed to Sphinx for pickling and generating output.

Since we support several different system configurations, we must vary the number of and data for each of the Jinja contexts we define for building the content that is fed to Sphinx. In rudimentary terms, a main context is composed of its core block of data plus two lists: a) contexts_list which contains the data from sibling contexts, and b) additional_contexts_list which contains the data from children contexts. We came up with that scheme to allow all contexts to be visible at the same time and be able to collect data from all of them when building a document. We have been using this approach for the past 6 years while using older versions of Sphinx.

It is when encountering either our contexts_list or additional_contexts_list that the RecursionError is produced. Forcing Sphinx 7.1.2 to bypass the is_serializable function or running the exact same data with an older Sphhinx version that does not have that check/function allows Sphinx to successfully generate the .tex and .pdf files we are after.

Each sibling or child context block of data can be estimated at below 10MB and above 1MB.

Can is_serializable be modified to handle the approach from above? ALTERNATIVELY, does Sphinx provide a way to access data from another context when working within one context?

How to Reproduce

I cannot copy-paste the details or amount of technical data that makes up each environment within either the contexts_list or the additional_contexts_list in here. The proprietary nature and volume of the data tie my hands.

As questions come up, I can work with someone to answer them.

The high-level command we issue is "sphinx-build -b latex -d build/doctrees source build/latex".

Environment Information

We are running Gitlab with a runner built using the "sphinxdoc/sphinx-latexpdf" image from Docker Hub.  When the StopIteration issue is encountered and Sphinx dies, Gitlab automatically performs cleanup and removes the container (i.e., the runner).  Attempting to execute "sphinx-build --bug-report" does not pan out in that situation.

Below is an abbreviated stack trace of the error.

pickling environment... failed
[app] emitting event: 'build-finished'(RecursionError('maximum recursion depth exceeded in __instancecheck__'),)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sphinx/cmd/build.py", line 290, in build_main
    app.build(args.force_all, args.filenames)
  File "/usr/local/lib/python3.11/site-packages/sphinx/application.py", line 351, in build
    self.builder.build_update()
  File "/usr/local/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 287, in build_update
    self.build(['__all__'], to_build)
  File "/usr/local/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 327, in build
    pickle.dump(self.env, f, pickle.HIGHEST_PROTOCOL)
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 323, in __getstate__
    if key.startswith('_') or not is_serializable(value):
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in is_serializable
    return all(is_serializable(i) for i in obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in is_serializable
    return all(is_serializable(i) for i in obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^

< ...line 47 and line 50 continue to be called a very large number of times... >

  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 50, in <genexpr>
    return all(is_serializable(i) for i in obj)
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
                                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 47, in is_serializable
    if not is_serializable(key) or not is_serializable(value):
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sphinx/config.py", line 43, in is_serializable
    if isinstance(obj, UNSERIALIZABLE_TYPES):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded in __instancecheck__
Recursion error:
maximum recursion depth exceeded in __instancecheck__
This can happen with very large or deeply nested source files. You can carefully increase the default Python recursion limit of 1000 in conf.py with e.g.:
    import sys; sys.setrecursionlimit(1500)

Sphinx extensions

We are running Gitlab with a runner built using the "sphinxdoc/sphinx-latexpdf" image from Docker Hub.  When the StopIteration issue is encountered and Sphinx dies, Gitlab automatically performs cleanup and removes the container (i.e., the runner).  Attempting to execute "sphinx-build --bug-report" does not pan out in that situation.

On the "sphinxdoc/sphinx-latexpdf" image's Debian OS, we perform an update, an upgrade, and add packages.

  - apt-get update
  - apt-get upgrade -y
  - apt-get install -y git
  - apt-get install -y unzip
  - apt-get install -y wget
  - apt-get install -y texlive
  - apt-get install -y texlive-bibtex-extra
  - apt-get install -y texlive-font-utils
  - apt-get install -y texlive-lang-english
  - kpsewhich -var-value=TEXMFLOCAL
  - kpsewhich -var-value=TEXMFDIST
  - unzip -qo acrotex.zip
  - cd acrotex
  - |+
    for acrofile in $(ls -1 *.ins | egrep -v 'exerquiz|acrotex')
    do
      latex $acrofile
    done
    mkdir -p /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.sty /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.cfg /usr/share/texlive/texmf-dist/tex/latex/acrotex
    cp *.def /usr/share/texlive/texmf-dist/tex/latex/acrotex
  - cd ..
  - mktexlsr /usr/share/texlive/texmf-dist
  - rm -rf acrotex*

We also need to pip install a number of Python packages that are required for our documentation to be generated.

  - python -m pip install sphinx-autobuild
  - python -m pip install sphinx-git
  - python -m pip install sphinxcontrib-actdiag
  - python -m pip install sphinxcontrib-ansibleautodoc
  - python -m pip install sphinxcontrib-autoprogram
  - python -m pip install sphinxcontrib-blockdiag
  - python -m pip install sphinxcontrib-confluencebuilder
  - python -m pip install sphinxcontrib-jsonschema
  - python -m pip install sphinxcontrib-jupyter
  - python -m pip install sphinxcontrib-nwdiag
  - python -m pip install sphinxcontrib-plantuml
  - python -m pip install sphinxcontrib-seqdiag
  - python -m pip install sphinxcontrib-websupport
  - python -m pip install ciscoconfparse
  - python -m pip install decorator
  - python -m pip install enum34
  - python -m pip install funcparserlib
  - python -m pip install gitdb
  - python -m pip install Jinja2==3.0.3
  - python -m pip install jupyter-core
  - python -m pip install netaddr
  - python -m pip install plantuml
  - python -m pip install python-dateutil
  - python -m pip install pyyaml
  - python -m pip install sets
  - python -m pip install tablib

Additional context

No response

picnixz commented 10 months ago

The issue is likely because of circular references since you are sharing everything with everyone. And you likely have references to things that should not.

I have two ideas:

verify your references and only include whatever needs to be included; don't share more what should be needed. I think your environment has too much circular references. So, before implementing a solution on our side, check that your objects do not have circular references, and if they do have, I would suggest coming up with an alternative.
we protect is_serializable against self-recursions by detecting whether we are comparing things that were already being compared. However, we cannot protect against this:
```
d = {'a': {'b': 2}}
d['a'] = d
```
With that structure, there is no way we can actually check 'b' (and actually you'll never be able to get the value of 'b' itself so it's not really an issue).

For now, I'd say it's better you stick with an older Sphinx version but you should carefully decide whether you need circular references or not (but since it worked for the past years you'll likely not change). I can come up with a fix for handling non-pathological cases because it's just using the same idea as reprlib.recursive_repr.

picnixz commented 7 months ago

While #12196 does not directly fix it, I'm closing the issue since I don't know whether the recursion error occurred inside the __instancecheck__ (in which case, I cannot do anything and it's mostly an issue with a custom type, and I need MWE for that) or not.

sphinx-doc / sphinx