sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.23k stars 2.04k forks source link

"env-merge-info" for the "write" phase #9480

Open xmo-odoo opened 3 years ago

xmo-odoo commented 3 years ago

Is your feature request related to a problem? Please describe.

There are currently no built-in facilities for the parallel-write-safety of extensions which need to collect data during writing and make use of that information later (during generation of output file or build-finished.

The specific issue here is related to the sphinx-sitemap extension, which basically does all its work during the writing phase: it collects data during html-page-context and writes out its production during built-finished. As a result it currently does not work at all in parallel_write mode, and it's not entirely clear how to cleanly make it parallel_write_safe. Which is a shame as that means sphinx-sitemap incurs a large build time wallclock regression when enabled (and also a bright red warning from Sphinx).

Describe the solution you'd like

An event similar to env-merge-info but taking place after the parallel writing phase. It's also not entirely clear whether "15. Generate output files" is parallelised or not, and if so how. If the output file generation is parallelised, there may need to be two different events (one to gather information after parallel post-transforms and doctree-resolution, and an other after output generation).

Describe alternatives you've considered

That Sphinx provides communication pipelines between the workers (and the controller) instead, creating those pipelines by hand is pretty error-prone and heavy especially as the specific parallelisation model is undefined (for good reasons), and will (hopefully) change in the future to better accomodate Windows and macOS.

tk0miya commented 3 years ago

From the point of view of sphinx-core, there is no reason to add such an event because the env object should not be updated during the writing phase. I wonder why you did not use other events to collect information. In my understanding, html-page-context is not an event for collecting data.

xmo-odoo commented 2 years ago

From the point of view of sphinx-core, there is no reason to add such an event because the env object should not be updated during the writing phase. I wonder why you did not use other events to collect information. In my understanding, html-page-context is not an event for collecting data.

@tk0miya that's certainly fair enough. I'm not the creator of the extension (merely a potential user). I figure the original author wanted to collect HTML pages specifically but also completely, as sitemap only makes sense in that context. So didn't want the generator to run for non-HTML builders, and also wanted to collect non-document HTML pages (pages like search.html seem to be collected by the extension, I don't know whether other collector events would see them).

tk0miya commented 2 years ago

I can agree with adding a new event for collect HTML pages before writing documents. I think it's a much better design.

xmo-odoo commented 2 years ago

@tk0miya fwiw this is in the context of jdillard/sphinx-sitemap#29, in case you have suggestions as to how to do that better. The extension is not enormous, and the way it currently works is more than a bit of a hack but I'm not sure there's a good way to achieve the same (a dedicated builder would have issues with non-document HTML pages, and wouldn't really make sense on its own as a sitemap is really part of an HTML site / production)