Performance improvement : Cache pre-pickled documents

We have enormous documents in which some individual files .. include:: hundreds of external .rst files.

This sometimes leads to individual .doctrees files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours).

After profiling the code, repeated calls to pickle.loads() targeting those 5MB files where found. It appears that Sphinx will pickle.loads() the 5MB file at each cross-reference .

While sphinx/environment/__init__.py already caches the raw bytes for each pickled doctree it would be more efficient to cache the result of pickle.loads() instead.

Caching the pre-pickled nodes.document instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).

I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup.

I have opened a pull request with my workaround. Feel free to let me know your thoughts !

sphinx-doc / sphinx

Performance improvement : Cache pre-pickled documents #12883