pydata / pydata-sphinx-theme

A clean, three-column Sphinx theme with Bootstrap for the PyData community
https://pydata-sphinx-theme.readthedocs.io
BSD 3-Clause "New" or "Revised" License
618 stars 321 forks source link

Slow page builds in 6.0 when generating navigation bar #381

Closed charris closed 2 years ago

charris commented 3 years ago

See https://github.com/numpy/numpy/pull/18756. Build times went from around 10 minutes to 30+ on CircleCI.

Things to try

To resolve this

We decided that the slowdown here is somewhat unavoidable, as long as you want to keep multiple levels within your navigation bar. For some tips about speeding up site builds, see: https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar

jorisvandenbossche commented 3 years ago

I am having the same issue with pandas, which is related to the API docs -> https://github.com/pydata/pydata-sphinx-theme/issues/364 I assume for numpy it might be a similar issue? (I don't know the size of your API docs, but I assume also quite big)

bollwyvl commented 3 years ago

Perhaps we need to consider adding some explicit benchmarks... in the meantime, i'll see if i can dig up some numbers AP for the docs site build here so we can bisect a little.

bollwyvl commented 3 years ago

Here are some rough findings...

Looking at the data, my takeaway is b822548 is the place to start looking, when it jumps up and doesn't come back down....

head_sha started_at duration
4540f6a96d84bf3103227eed9d829c445817d578 2020-04-23 03:38:37+00:00 4
4540f6a96d84bf3103227eed9d829c445817d578 2020-04-23 03:38:37+00:00 4
1e8295d42ae99271750a9eec4efcc7360275ed9a 2020-04-28 15:02:45+00:00 4
1e8295d42ae99271750a9eec4efcc7360275ed9a 2020-04-28 15:02:45+00:00 4
e86bcd1eb5be3f88982ae8659560bd43d8d6e4d3 2020-05-04 06:46:59+00:00 11
e86bcd1eb5be3f88982ae8659560bd43d8d6e4d3 2020-05-04 06:46:59+00:00 11
24ae3c5c2d86190542372e045407f111a849418e 2020-05-04 07:00:14+00:00 10
24ae3c5c2d86190542372e045407f111a849418e 2020-05-04 07:00:14+00:00 10
e089e6d168efab17585833b287a71371f9d25124 2020-05-04 22:07:36+00:00 9
e089e6d168efab17585833b287a71371f9d25124 2020-05-04 22:07:36+00:00 9
8ae0c51907a8d9d1bb9c1bb6fc2cbbe2bef4266f 2020-05-06 01:14:22+00:00 8
8ae0c51907a8d9d1bb9c1bb6fc2cbbe2bef4266f 2020-05-06 01:14:22+00:00 8
4a33ff5a4de3adf684fcd86dad9319203cd94a7f 2020-05-20 20:40:31+00:00 12
4a33ff5a4de3adf684fcd86dad9319203cd94a7f 2020-05-20 20:40:31+00:00 12
4576172faa12f493eb27b508c248f0519902707a 2020-05-27 10:02:27+00:00 9
4576172faa12f493eb27b508c248f0519902707a 2020-05-27 10:02:27+00:00 9
e6680a3a4dcff2c52f927b759b5baba080434ac7 2020-05-27 16:36:03+00:00 11
e6680a3a4dcff2c52f927b759b5baba080434ac7 2020-05-27 16:36:03+00:00 11
45b4dc53a2375fd1aaba8fa5792248ab1bbe3732 2020-06-08 06:28:18+00:00 10
45b4dc53a2375fd1aaba8fa5792248ab1bbe3732 2020-06-08 06:28:18+00:00 10
b4a670f853d5c892f96030a7b85d9b52b3b37f1f 2020-06-22 14:58:35+00:00 10
b4a670f853d5c892f96030a7b85d9b52b3b37f1f 2020-06-22 14:58:35+00:00 10
64fde80d5128b17725918b50c18cfbeffdb263eb 2020-06-23 06:22:16+00:00 10
64fde80d5128b17725918b50c18cfbeffdb263eb 2020-06-23 06:22:16+00:00 10
e95b564578807e039350317c57e83d1a7bd65458 2020-06-23 12:36:20+00:00 9
e95b564578807e039350317c57e83d1a7bd65458 2020-06-23 12:36:20+00:00 9
74ffef0ce8c6d838b549fc0094e30f857832c2ca 2020-06-23 17:56:04+00:00 10
74ffef0ce8c6d838b549fc0094e30f857832c2ca 2020-06-23 17:56:04+00:00 10
d697ef09866ad6862f248c71c632e658c27d9ad3 2020-06-25 06:50:23+00:00 12
d697ef09866ad6862f248c71c632e658c27d9ad3 2020-06-25 06:50:23+00:00 12
6de76cb208fa8f8f4b9cf2c140e2de646b66fb15 2020-08-22 08:40:19+00:00 10
6de76cb208fa8f8f4b9cf2c140e2de646b66fb15 2020-08-22 08:40:19+00:00 10
190f32bae7b3f58c4af5fefe3e92f7dd09e717c4 2020-08-22 08:43:09+00:00 11
190f32bae7b3f58c4af5fefe3e92f7dd09e717c4 2020-08-22 08:43:09+00:00 11
65ed520ff599a8468e0098f36cb7483d5c086134 2020-09-16 05:59:07+00:00 11
65ed520ff599a8468e0098f36cb7483d5c086134 2020-09-16 05:59:07+00:00 11
404a6f57a266b71dd90becdd2337c77471eebb9d 2020-09-21 17:30:15+00:00 9
404a6f57a266b71dd90becdd2337c77471eebb9d 2020-09-21 17:30:15+00:00 9
7163801b25d3e2f15a06fa403bac4fbd00818198 2020-09-23 18:58:00+00:00 7
7163801b25d3e2f15a06fa403bac4fbd00818198 2020-09-23 18:58:00+00:00 7
15819a8373c7fffa700cd9c5cd9c3531889aece8 2020-09-28 08:16:19+00:00 8
15819a8373c7fffa700cd9c5cd9c3531889aece8 2020-09-28 08:16:19+00:00 8
bdf8224df3630b66e7bb1d819bac1edf2d257e83 2020-09-29 06:28:47+00:00 11
bdf8224df3630b66e7bb1d819bac1edf2d257e83 2020-09-29 06:28:47+00:00 11
52cb046d9e6fdbd6d8d44a4a9469b47fd5eae228 2020-10-06 19:11:49+00:00 9
52cb046d9e6fdbd6d8d44a4a9469b47fd5eae228 2020-10-06 19:11:49+00:00 9
ef61ea0d8223c74216c8a08f1d018d67b24b775f 2020-10-06 23:41:47+00:00 11
ef61ea0d8223c74216c8a08f1d018d67b24b775f 2020-10-06 23:41:47+00:00 11
edc2a0bdee1411ac9f7d5950fcacb4720a7134e0 2020-11-02 19:26:07+00:00 8
edc2a0bdee1411ac9f7d5950fcacb4720a7134e0 2020-11-02 19:26:07+00:00 8
e9875581b2927af64569c1f48a414d0e4426c0c5 2020-11-05 12:46:52+00:00 11
e9875581b2927af64569c1f48a414d0e4426c0c5 2020-11-05 12:46:52+00:00 11
f321520e1240253c74b8d97d19f87e34d4ef4017 2020-11-18 16:07:15+00:00 14
f321520e1240253c74b8d97d19f87e34d4ef4017 2020-11-18 16:07:15+00:00 14
fe61e9ddb1db59b300c3ecfa0d910d9a08003b34 2020-12-16 09:47:41+00:00 9
fe61e9ddb1db59b300c3ecfa0d910d9a08003b34 2020-12-16 09:47:41+00:00 9
b3241493e2581c847bed65e0870e11b8c80e678e 2020-12-23 09:21:05+00:00 9
b3241493e2581c847bed65e0870e11b8c80e678e 2020-12-23 09:21:05+00:00 9
f2c33be7ca0fa784ce04b8a05cdcbf09bd7c968a 2020-12-28 21:16:31+00:00 9
f2c33be7ca0fa784ce04b8a05cdcbf09bd7c968a 2020-12-28 21:16:31+00:00 9
969718267981738e503f874b1e0554c5446cf559 2020-12-28 21:27:28+00:00 9
969718267981738e503f874b1e0554c5446cf559 2020-12-28 21:27:28+00:00 9
2488b7defbd3d753dd5fcfc890fc4a7e79d25103 2021-01-19 08:34:59+00:00 9
2488b7defbd3d753dd5fcfc890fc4a7e79d25103 2021-01-19 08:34:59+00:00 9
7d14f11ace7bbae32f0e6d2acdfb84b68b98ae10 2021-01-19 09:07:58+00:00 9
7d14f11ace7bbae32f0e6d2acdfb84b68b98ae10 2021-01-19 09:07:58+00:00 9
ab928989b450e1756419427739b751bebb3f3603 2021-01-19 12:10:44+00:00 9
ab928989b450e1756419427739b751bebb3f3603 2021-01-19 12:10:44+00:00 9
d70d8942413eff942f4a908263a3f0978894a9e5 2021-01-24 15:45:27+00:00 10
d70d8942413eff942f4a908263a3f0978894a9e5 2021-01-24 15:45:27+00:00 10
c4a64251673856858938e3ae2becede8278a245e 2021-01-25 12:24:51+00:00 11
c4a64251673856858938e3ae2becede8278a245e 2021-01-25 12:24:51+00:00 11
8a203b7f8f265bf11a61cded55f94e32c836b95d 2021-01-26 10:34:29+00:00 8
8a203b7f8f265bf11a61cded55f94e32c836b95d 2021-01-26 10:34:29+00:00 8
270bf6c60de2c9f16aa32c3eeb6de39f0ee0e734 2021-01-26 10:45:51+00:00 9
270bf6c60de2c9f16aa32c3eeb6de39f0ee0e734 2021-01-26 10:45:51+00:00 9
e32af5fa53a91b79e2d14a5b28753d6120800927 2021-03-09 18:40:28+00:00 9
e32af5fa53a91b79e2d14a5b28753d6120800927 2021-03-09 18:40:28+00:00 9
65ca2db6ca13110426ae9fd0f359d3063d28535d 2021-03-09 20:24:44+00:00 8
65ca2db6ca13110426ae9fd0f359d3063d28535d 2021-03-09 20:24:44+00:00 8
f2d189a2076012b35e930db816c307d9a20e3228 2021-03-10 14:25:33+00:00 10
f2d189a2076012b35e930db816c307d9a20e3228 2021-03-10 14:25:33+00:00 10
82bf21cafa2f76719dda253636a26e1cc9e996f8 2021-03-11 19:39:37+00:00 10
82bf21cafa2f76719dda253636a26e1cc9e996f8 2021-03-11 19:39:37+00:00 10
579eec63462d9e6987c6514231c2ca78fca85547 2021-03-22 03:29:56+00:00 8
579eec63462d9e6987c6514231c2ca78fca85547 2021-03-22 03:29:56+00:00 8
f81cf47b29f0038658b22dfba7a98368d0a93089 2021-03-22 08:51:40+00:00 12
f81cf47b29f0038658b22dfba7a98368d0a93089 2021-03-22 08:51:40+00:00 12
199f69fb1d9d71ae8bb74d44424a77e6a9f9deaf 2021-03-22 08:53:01+00:00 12
199f69fb1d9d71ae8bb74d44424a77e6a9f9deaf 2021-03-22 08:53:01+00:00 12
ce961a835912e807dc7da32652c34daadcc89385 2021-03-22 08:55:03+00:00 11
ce961a835912e807dc7da32652c34daadcc89385 2021-03-22 08:55:03+00:00 11
925ac87225cf1e303b180e35a59809ac9db2e3b2 2021-03-22 09:00:27+00:00 9
925ac87225cf1e303b180e35a59809ac9db2e3b2 2021-03-22 09:00:27+00:00 9
c36390f43f7394c65623782ef96904cfc3cd6a93 2021-03-25 08:51:56+00:00 13
c36390f43f7394c65623782ef96904cfc3cd6a93 2021-03-25 08:51:56+00:00 13
fd1709c6e91c2c508c2896b8e8857e0009c37f68 2021-03-25 09:01:57+00:00 9
fd1709c6e91c2c508c2896b8e8857e0009c37f68 2021-03-25 09:01:57+00:00 9
d56e601f83a4d4cdd7ef8d5d44d9b7efd6b1adf9 2021-03-26 09:54:10+00:00 9
d56e601f83a4d4cdd7ef8d5d44d9b7efd6b1adf9 2021-03-26 09:54:10+00:00 9
990853006311b938487842ea65998f7d473fd674 2021-03-27 17:50:44+00:00 9
990853006311b938487842ea65998f7d473fd674 2021-03-27 17:50:44+00:00 9
b822548e56ff1ccc83ba71e1e03923cbdae8377a 2021-03-27 23:17:46+00:00 12
b822548e56ff1ccc83ba71e1e03923cbdae8377a 2021-03-27 23:17:46+00:00 12
f637474f204b10785860a59c2ddae016959200fc 2021-03-31 20:48:27+00:00 12
f637474f204b10785860a59c2ddae016959200fc 2021-03-31 20:48:27+00:00 12
85d0b9ce19f5b62ac8db7e41554825f587312316 2021-04-01 20:11:30+00:00 10
85d0b9ce19f5b62ac8db7e41554825f587312316 2021-04-01 20:11:30+00:00 10
3683cf69871f9f109620e335828dbf0e29f9c151 2021-04-03 13:04:50+00:00 11
3683cf69871f9f109620e335828dbf0e29f9c151 2021-04-03 13:04:50+00:00 11
013d4b801a64be2cef695957c06ba1d9cb4712ff 2021-04-04 18:45:28+00:00 11
013d4b801a64be2cef695957c06ba1d9cb4712ff 2021-04-04 18:45:28+00:00 11
3b45a37f2bedb4b2d9aed352d8cccc5b36ab5dc7 2021-04-04 22:02:42+00:00 11
3b45a37f2bedb4b2d9aed352d8cccc5b36ab5dc7 2021-04-04 22:02:42+00:00 11
f1e4d915628fb7dee6c8562c79455d86bd94e157 2021-04-09 12:38:28+00:00 10
f1e4d915628fb7dee6c8562c79455d86bd94e157 2021-04-09 12:38:28+00:00 10
fea9cdcb75d358d9c22f32493300e3c6b6f39ce5 2021-04-09 09:01:24-04:00 11
fea9cdcb75d358d9c22f32493300e3c6b6f39ce5 2021-04-09 09:01:24-04:00 11
25ca6751a99bbb6b267efc898d54fe0c75d53f41 2021-04-09 09:24:41-04:00 14
25ca6751a99bbb6b267efc898d54fe0c75d53f41 2021-04-09 09:24:41-04:00 14
a9576511cc443aab6fc51a914a58cdaa35077ec1 2021-04-09 17:19:40-04:00 15
a9576511cc443aab6fc51a914a58cdaa35077ec1 2021-04-09 17:19:40-04:00 15
14ad42d3afc6d17d5ddf1408ce2a8bc9871b58e4 2021-04-09 17:24:29-04:00 13
14ad42d3afc6d17d5ddf1408ce2a8bc9871b58e4 2021-04-09 17:24:29-04:00 13
jorisvandenbossche commented 3 years ago

I think our demo docs are a bit too small to really find the culprit. I am currently building the pandas docs locally under a profiler, will report back here in a bit (it takes a while to build though .. ;))

jorisvandenbossche commented 3 years ago

So I build a subset of the pandas API docs (removing the narrative user guide, as the slowdown comes from the writing phase, and it's the API docs that has many pages): https://gist.githubusercontent.com/jorisvandenbossche/f5ff72ee2eea52c30193abc2e9b5cd05/raw/bcc68040a7b3691e58828bf2dddaaed7d9866f57/profile-pandas-docs.svg

Most of the time is spent in generate_nav_html (more than 80% in this case). Digging deeper, around 30% is spent in resolve from sphinx (this has increased, because with collapse=False, the size of the toctree to resolve has become much bigger). There is certainly a significant part spent in bs4 as well (but would need to compare with lxml how much that can be reduced, however I assume that part of this is also simply due to the larger HTML size of the resulting pages that gets parsed)

jorisvandenbossche commented 3 years ago

And the version with using the lxml parser through bs4: https://gist.githubusercontent.com/jorisvandenbossche/8aab410b0231a74d755ed54e656e5b7c/raw/8720e2df4426088483b6b775bac982cdb35bdc19/profile-pandas-docs-lxml.svg

(for a big site like pandas, this doesn't seem to make much difference)

jorisvandenbossche commented 3 years ago

We probably want to include a configuration option like readthedocs theme has: collapse_navigation: https://sphinx-rtd-theme.readthedocs.io/en/latest/configuring.html#confval-collapse_navigation

bollwyvl commented 3 years ago

Well, if the lxml thing is a red herring (or maybe i don't have a handle on how to interpret the profiling): can more things be cached along the way? I don't know enough about the toctree data structure, but seems like it would be possible to generate The Tree, and then slice off the pieces needed per page?

jorisvandenbossche commented 3 years ago

Yeah, I think sphinx is definitely doing a lot of duplicated effort .. However, to do that on our side might require some deeper plumbing into sphinx.

For example, a large part of the time is spent in this resolve call:

https://github.com/pydata/pydata-sphinx-theme/blob/a29386280d800822b5f3925bae1ce494d7626b4c/pydata_sphinx_theme/__init__.py#L328

The toctree it it is resolving is the same in many cases, but each time (for each page), the docname is different, i.e. each call of this function is slightly different. So we can't easily "cache" it on our side. I think the only difference between resolve calls for pages that have the same root index is that the "current" class in the HTML is tagged onto a different item in the navigation list. That's of course a tiny difference for repeating the expensive operation .. So we could maybe think about resolving it for the root index once and then add the "current" tags ourselves with some HTML bs4 manipulation.

cheekyshibe commented 3 years ago

Same problem. The average building time increases from less than 10min to about 15min.

writing output... [ 97%]
writing output... [ 97%]
waiting for workers...    <- that takes a long time :(   Does anybody know it is waiting for what exactly?

generating indices... genindex py-modindex done
copying notebooks ... [100%] 
highlighting module code... [100%] 
writing additional pages... search done

Seems that generating the collapsible sidebar spends a lot of time?


Updated: 15min used in my 8core 16g machine. But for GitHub Actions (Ubuntu 2core 8g),

now is over 1 hour.., https://github.com/MegEngine/Documentation/runs/2619670415

image

It's hard to accept. :(


Updated again, the build artifact now is over 1.2G, most from API HTML files, and each file has over 10000 lines.

hawkinsp commented 2 years ago

In case anyone lands here and is looking for a workaround, the instructions in https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar helped a lot for our project that uses sphinx-book-theme (which in turn uses pydata-sphinx-theme) and has many autogenerated API docs.

(I found this issue by profiling my sphinx-build command using cprofile and identifying that generate_nav_html took a very large fraction of the total build time.)

choldgraf commented 2 years ago

Ah thanks for linking that @hawkinsp - that section was added to address this issue, so I think that we can close this one and I'll update the top comment with a link to that section