Open squidfunk opened 2 weeks ago
Are there public examples of large repositories that take up to 30 minutes to build? I tried locally with 10K dummy files and ran out of memory before the site was built :sweat_smile: With 1K files, the template rendering seems to be the most costly.
Users have mentioned this in multiple occasions, but I'm having a hard time finding it due to GitHub's rather mediocre issue search. Here's what I could gather from a quick search:
The fact is that 30min is a worst case scenario. Even a repeated build that takes 1 minute is too slow to be useful, and --dirtyreload
isn't a workable solution due to the problems stated, especially for plugin authors. It also doesn't solely depend on the number of pages, but on the plugins used. Thus, discussing how plugins and the core could better work together to employ caching and reduce build time is a discussion we should start.
Running out of memory is another problem that should be fixed, as already discussed in https://github.com/mkdocs/mkdocs/issues/2669
This is a project with 3,400 files and a very limited set of plugins, i.e., search
, minify
and social
:
https://github.com/openfabr/terraform-provider-cdk-docs
IMHO, not many plugins, and the social plugin which I wrote employs caching, which means repeated builds are much cheaper due to leveraging cached images. I've built the project on my machine, an M2 MacBook Pro:
First build
INFO - Cleaning site directory
INFO - Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO - Documentation built in 537.92 seconds
Repeated build (social plugin cached)
INFO - Cleaning site directory
INFO - Building documentation to directory: .../terraform-provider-cdk-docs/site
...
INFO - Documentation built in 487.02 seconds
It's infeasible to make edits on this project without --dirtyreload
, which as mentioned is incorrect, plus the author has to wait for more than 8 minutes until the live reload server becomes available. Add a few more plugins and a few hundred more pages and you're up to 20 minutes.
I tested the repository mentioned above on my Ryzen 3600 Windows 10 PC, mkdocs==1.6.0, mkdocs-material==9.5.20 First build:
$ mkdocs build
INFO - Cleaning site directory
INFO - Building documentation to directory: C:\MyFiles\_git\removable\performance-test\site
... A lot of warnings about absolute paths etc., which could also impact performance due to printing to Terminal
INFO - Documentation built in 1335.42 seconds
Repeated build: I do not dare to run it again 😅
I used my performance_debug hook. Debug YAML result: performance_debug_first.yml.txt More info about the categories can be found in the gist Python file, but most should be self explanatory, but the amount of files could have generated quite a bit of noise 🤔
PLUGINS_PER_EVENTS:
on_post_page|mkdocs_minify_plugin.plugin.MinifyPlugin: 958.97267 # The main culprit of the long build time
on_page_context|material.plugins.search.plugin.SearchPlugin: 23.14142 # Expected given the amount of files
on_config|material.plugins.social.plugin.SocialPlugin: 10.61701 # on_config not expected being affected by amount of files, is it always this slow?
on_page_markdown|material.plugins.social.plugin.SocialPlugin: 1.20333 # magic of concurrency
...
on_post_build|material.plugins.social.plugin.SocialPlugin: 0.00389 # magic of concurrency
Currently the mkdocs serve
will invoke the same as mkdocs build
, so the benchmark results apply there too.
The main issue is with the minify plugin, a much cheaper (performance-wise) minification, of sorts, could be achieved using Jinja2 Environment settings, which I mentioned here, and another approach would be proper enforcement of whitespace management inside the template files, via the %-
tags. Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process 🤔
MARKDOWN_PER_CLASSES:
pymdownx.superfences.SuperFencesBlockPreprocessor: 11.09541
markdown.treeprocessors.InlineProcessor: 10.98559
I'm surprised those markdown values are so low, as last time I checked with GMC (~190 files) the same classes had ~6 seconds each. Perhaps the complexity of the Markdown or the amount of Code Blocks has a bigger impact than I thought. But still 3k files vs 200 files and only a x2 time increase seems odd hmm
Template rendering took ~270 seconds:
TEMPLATE_ROOTS:
main.html|sum: 267.45968
this time gets repeated each re-serve without --dirtyreload
Caching for later builds with mkdocs serve
won't help much, as it immediately turns off the prospective user.
Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.
So I would like to see some sort of on-demand loading, serve
would only process index.html and later only load pages when navigating to them. This of course breaks the last on_post_build
event, as plugins expect all files to be present in the site
directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO
I guess this would requires a fork in mkdocs serve
and mkdocs build
event loops? Rather risky, but would allow for more control maybe? Just a first top of the head idea ✌️
Perhaps some sort of another minify plugin needs to be released which uses some C/Rust libraries to handle the minification process 🤔
Like this one https://github.com/monosans/mkdocs-minify-html-plugin? Could you build once with it and see if you just spared 950 seconds or so :stuck_out_tongue:? It only minifies HTML files though apparently (but still CSS and JS within them).
Also, solid work @kamilkrzyskow :+1: Thanks for making and sharing all this!
Ah, nice, I didn't know about the minify-html plugin! I'll check it out and probably switch to it. Offloading pure string processing to Rust makes a lot of sense.
Caching for later builds with
mkdocs serve
won't help much, as it immediately turns off the prospective user. Also rendering the whole docs in the background with concurrency seems also like a waste of resources when I only want to check one web page only.So I would like to see some sort of on-demand loading,
serve
would only process index.html and later only load pages when navigating to them. This of course breaks the laston_post_build
event, as plugins expect all files to be present in thesite
directory, so invoking it after only a few pages were built could lead to issues. Other events are more agile IMO
The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.
And then there are those scenarios where a page's content consists of the pages collection (either be means of a plugin or as a static template). In that case, to render that page (even if the nav is excluded), the entire pages collection is needed.
Ultimately, it has been the above two issues which have thus far prevented a better solution from being developed. Work out a way to address those and then we may have a workable solution.
Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially? I'm imagining some utilities to build a "pipeline" of things to run depending on whether they support concurrency or not.
Quick flowchart which doesn't make sense but illustrate the idea:
flowchart TD
p1f["plugin1.on_files"]
p2f["plugin2.on_files"]
p3f["plugin3.on_files"]
p1n["plugin1.on_nav"]
p2n["plugin2.on_nav"]
p3n["plugin3.on_nav"]
p1pm["plugin1.on_page_markdown"]
p2pm["plugin2.on_page_markdown"]
p3pm["plugin3.on_page_markdown"]
start --> p1f & p2f
p1f & p2f --> p3f
p3f --> p1n & p2n & p3n
p1n & p2n & p3n --> p1pm
p1pm --> p2pm & p3pm
on_files
run concurrently, then plugin 3 on_files
sequentially.on_nav
run concurrently.on_page_markdown
runs sequentially, then plugin 2 and 3 on_page_markdown
run concurrently.EDIT: hmm I suppose there's another possible layer of concurrency on files/pages themselves. The transformation pipeline would likely be quite complex. I'm sick and have fever today so please be indulgent :joy:
Quick thought: what if plugins informed MkDocs whether each one of their hooks could be executed concurrently, or only sequentially?
This is exactly what Sphinx does. Each extension defines if it's safe for parallel reading and/or parallel writing. See https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata
I haven't checked how it works internally, but it's probably something to explore a little more and see if there are some ideas that can be reused.
I would like to be able to use parallel build. It has been stated in #1900 that the benefit is not so high. However, I have lots of jupyter-notebooks to convert (the execute step consumes most of the time). I ended up executing all notebooks concurrently in advance.
The issue is that the site navigation requires the entire pages collection to be available for the one page to be rendered. This is where caching and/or concurrency would likely be helpful. For that matter, the pages don't need to all be fully rendered, but they all do need to be read and processed to a certain extent to determine the page title, etc for the nav.
Okay, so I've been working on this and I've got enough to demo now...
https://github.com/mkdocs/sketch/tree/main
That's a work-in-progress of "how could mkdocs look" that properly deals with this issue.
Specifically, the mkdocs serve
command doesn't require a site build at all*
I needed to do a bit of poking to make this work with the terraform example above (since it doesn't include a nav
config), tho once I'd done there serve startup time was under a second.
There's other aspects that I'm looking to address as part of that work, just getting things into shape so that I've got a coherent body of work to start sharing here.
search indexes aren't in there just yet. yes they would* require a full-site build, but we can use HTML rel=preload
links to prompt them in the background, and likely also have per-page caching.
Nice work @tomchristie!
- search indexes aren't in there just yet. yes they would require a full-site build, but we can use HTML
rel=preload
links to prompt them in the background, and likely also have per-page caching.
In the case of mkdocstrings and its cross-references ability, rel=preload
wouldn't be enough. To statically resolve a cross-reference, we must wait for all pages to have been built. The only way to make cross-references work when serving pages on the fly (without building everything) would be to inject some Javascript magic :thinking: Like, the plugin would store query-able state in the server, that the client could continuously request, until all needed pages were loaded with rel=preload
and the unresolved references on the current page can be resolved :thinking: And since we don't know which pages are needed to resolve a reference, all pages would have to be pre-loaded anyway :thinking: (or, if not all, maybe most pages, with a priority order or something).
This looks really promising! Really excited how this will work with more complex setups. I guess there're still things to be worked out (haven't checked the implementation), but it's a great start! 👏
The
mkdocs serve
command provides a powerful write-build-check-repeat loop that is integral for documentation projects, setting MkDocs apart from many static site generators that lack live preview functionality. This feature greatly enhances the efficiency and accuracy of developing and refining documentation, allowing for immediate feedback and iterative improvements.Startup time of
mkdocs serve
Unfortunately, there are significant issues with the
mkdocs serve
command, particularly when working with large documentation projects that consist of thousands of pages. Currently,mkdocs serve
requires a full build of the documentation before it becomes interactive. This process can take an extensive amount of time, ranging from 30 to 40 minutes for large projects. This delay significantly impedes the ability to usemkdocs serve
effectively for previewing changes.The need for a preview is crucial, especially given that Material for MkDocs integrates with the Python Markdown Extensions, a powerful set of Markdown extensions, especially for technical writing, adding features like content tabs via Tabbed and enhanced indent detection through SuperFences. Unfortunately, editor support for these syntaxes is limited, if not non-existent. This lack of support means that authors must rely on
mkdocs serve
to preview changes. Given the current build times on large projects, authors face considerable difficulty in efficiently making and reviewing changes, essentially working 'blind' without this functionality. Performance is in fact one of the most major critiques on MkDocs.Problems with the
--dirtyreload
flagThe
--dirtyreload
flag in MkDocs offers a partial solution to speed up the re-build process during a documentation project's development by not rebuilding the entire site with each change. However, this flag only affects subsequent builds and does not improve the initial build time. Moreover, it introduces issues such as incorrect navigation and incomplete metadata, which can disrupt the functionality of plugins, like the blog plugin that struggles to correctly update archive and category indexes under--dirtyreload
. Consequently, plugins must be designed to specifically work around these limitations, complicating their development and integration.Conclusion
To significantly enhance the editing experience with MkDocs and reduce the environmental impact by saving thousands of build minutes daily, we need to focus on two critical improvements:
Reducing the initial preview load time: The time it takes from starting the live server to when the preview is first available needs to be substantially decreased. This change would make MkDocs more usable, especially for large projects.
Speeding up live preview updates: After making edits, the time to see these changes in the preview should be minimized. This improvement will support a more efficient and iterative documentation process.
Potential strategies to achieve these improvements include implementing more sophisticated caching mechanisms and exploring the possibility of parallelizing the build process. These changes would address both the initial and subsequent build times, making
mkdocs serve
a more robust tool for documentation development.