We have enormous documents in which some individual files .. include:: hundreds of external .rst files.
This sometimes leads to individual .doctrees files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours).
After profiling the code, repeated calls to pickle.loads() targeting those 5MB files where found. It appears that Sphinx will pickle.loads() the 5MB file at each cross-reference .
Caching the pre-pickled nodes.document instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).
I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup.
I have opened a pull request with my workaround. Feel free to let me know your thoughts !
We have enormous documents in which some individual files
.. include::
hundreds of external.rst
files.This sometimes leads to individual
.doctrees
files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours).After profiling the code, repeated calls to
pickle.loads()
targeting those 5MB files where found. It appears that Sphinx willpickle.loads()
the 5MB file at each cross-reference .While
sphinx/environment/__init__.py
already caches the raw bytes for each pickled doctree it would be more efficient to cache the result ofpickle.loads()
instead.Caching the pre-pickled
nodes.document
instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup.
I have opened a pull request with my workaround. Feel free to let me know your thoughts !