Closed charris closed 2 years ago
I am having the same issue with pandas, which is related to the API docs -> https://github.com/pydata/pydata-sphinx-theme/issues/364 I assume for numpy it might be a similar issue? (I don't know the size of your API docs, but I assume also quite big)
Perhaps we need to consider adding some explicit benchmarks... in the meantime, i'll see if i can dig up some numbers AP for the docs site build here so we can bisect a little.
Here are some rough findings...
Looking at the data, my takeaway is b822548 is the place to start looking, when it jumps up and doesn't come back down....
head_sha | started_at | duration |
---|---|---|
4540f6a96d84bf3103227eed9d829c445817d578 | 2020-04-23 03:38:37+00:00 | 4 |
4540f6a96d84bf3103227eed9d829c445817d578 | 2020-04-23 03:38:37+00:00 | 4 |
1e8295d42ae99271750a9eec4efcc7360275ed9a | 2020-04-28 15:02:45+00:00 | 4 |
1e8295d42ae99271750a9eec4efcc7360275ed9a | 2020-04-28 15:02:45+00:00 | 4 |
e86bcd1eb5be3f88982ae8659560bd43d8d6e4d3 | 2020-05-04 06:46:59+00:00 | 11 |
e86bcd1eb5be3f88982ae8659560bd43d8d6e4d3 | 2020-05-04 06:46:59+00:00 | 11 |
24ae3c5c2d86190542372e045407f111a849418e | 2020-05-04 07:00:14+00:00 | 10 |
24ae3c5c2d86190542372e045407f111a849418e | 2020-05-04 07:00:14+00:00 | 10 |
e089e6d168efab17585833b287a71371f9d25124 | 2020-05-04 22:07:36+00:00 | 9 |
e089e6d168efab17585833b287a71371f9d25124 | 2020-05-04 22:07:36+00:00 | 9 |
8ae0c51907a8d9d1bb9c1bb6fc2cbbe2bef4266f | 2020-05-06 01:14:22+00:00 | 8 |
8ae0c51907a8d9d1bb9c1bb6fc2cbbe2bef4266f | 2020-05-06 01:14:22+00:00 | 8 |
4a33ff5a4de3adf684fcd86dad9319203cd94a7f | 2020-05-20 20:40:31+00:00 | 12 |
4a33ff5a4de3adf684fcd86dad9319203cd94a7f | 2020-05-20 20:40:31+00:00 | 12 |
4576172faa12f493eb27b508c248f0519902707a | 2020-05-27 10:02:27+00:00 | 9 |
4576172faa12f493eb27b508c248f0519902707a | 2020-05-27 10:02:27+00:00 | 9 |
e6680a3a4dcff2c52f927b759b5baba080434ac7 | 2020-05-27 16:36:03+00:00 | 11 |
e6680a3a4dcff2c52f927b759b5baba080434ac7 | 2020-05-27 16:36:03+00:00 | 11 |
45b4dc53a2375fd1aaba8fa5792248ab1bbe3732 | 2020-06-08 06:28:18+00:00 | 10 |
45b4dc53a2375fd1aaba8fa5792248ab1bbe3732 | 2020-06-08 06:28:18+00:00 | 10 |
b4a670f853d5c892f96030a7b85d9b52b3b37f1f | 2020-06-22 14:58:35+00:00 | 10 |
b4a670f853d5c892f96030a7b85d9b52b3b37f1f | 2020-06-22 14:58:35+00:00 | 10 |
64fde80d5128b17725918b50c18cfbeffdb263eb | 2020-06-23 06:22:16+00:00 | 10 |
64fde80d5128b17725918b50c18cfbeffdb263eb | 2020-06-23 06:22:16+00:00 | 10 |
e95b564578807e039350317c57e83d1a7bd65458 | 2020-06-23 12:36:20+00:00 | 9 |
e95b564578807e039350317c57e83d1a7bd65458 | 2020-06-23 12:36:20+00:00 | 9 |
74ffef0ce8c6d838b549fc0094e30f857832c2ca | 2020-06-23 17:56:04+00:00 | 10 |
74ffef0ce8c6d838b549fc0094e30f857832c2ca | 2020-06-23 17:56:04+00:00 | 10 |
d697ef09866ad6862f248c71c632e658c27d9ad3 | 2020-06-25 06:50:23+00:00 | 12 |
d697ef09866ad6862f248c71c632e658c27d9ad3 | 2020-06-25 06:50:23+00:00 | 12 |
6de76cb208fa8f8f4b9cf2c140e2de646b66fb15 | 2020-08-22 08:40:19+00:00 | 10 |
6de76cb208fa8f8f4b9cf2c140e2de646b66fb15 | 2020-08-22 08:40:19+00:00 | 10 |
190f32bae7b3f58c4af5fefe3e92f7dd09e717c4 | 2020-08-22 08:43:09+00:00 | 11 |
190f32bae7b3f58c4af5fefe3e92f7dd09e717c4 | 2020-08-22 08:43:09+00:00 | 11 |
65ed520ff599a8468e0098f36cb7483d5c086134 | 2020-09-16 05:59:07+00:00 | 11 |
65ed520ff599a8468e0098f36cb7483d5c086134 | 2020-09-16 05:59:07+00:00 | 11 |
404a6f57a266b71dd90becdd2337c77471eebb9d | 2020-09-21 17:30:15+00:00 | 9 |
404a6f57a266b71dd90becdd2337c77471eebb9d | 2020-09-21 17:30:15+00:00 | 9 |
7163801b25d3e2f15a06fa403bac4fbd00818198 | 2020-09-23 18:58:00+00:00 | 7 |
7163801b25d3e2f15a06fa403bac4fbd00818198 | 2020-09-23 18:58:00+00:00 | 7 |
15819a8373c7fffa700cd9c5cd9c3531889aece8 | 2020-09-28 08:16:19+00:00 | 8 |
15819a8373c7fffa700cd9c5cd9c3531889aece8 | 2020-09-28 08:16:19+00:00 | 8 |
bdf8224df3630b66e7bb1d819bac1edf2d257e83 | 2020-09-29 06:28:47+00:00 | 11 |
bdf8224df3630b66e7bb1d819bac1edf2d257e83 | 2020-09-29 06:28:47+00:00 | 11 |
52cb046d9e6fdbd6d8d44a4a9469b47fd5eae228 | 2020-10-06 19:11:49+00:00 | 9 |
52cb046d9e6fdbd6d8d44a4a9469b47fd5eae228 | 2020-10-06 19:11:49+00:00 | 9 |
ef61ea0d8223c74216c8a08f1d018d67b24b775f | 2020-10-06 23:41:47+00:00 | 11 |
ef61ea0d8223c74216c8a08f1d018d67b24b775f | 2020-10-06 23:41:47+00:00 | 11 |
edc2a0bdee1411ac9f7d5950fcacb4720a7134e0 | 2020-11-02 19:26:07+00:00 | 8 |
edc2a0bdee1411ac9f7d5950fcacb4720a7134e0 | 2020-11-02 19:26:07+00:00 | 8 |
e9875581b2927af64569c1f48a414d0e4426c0c5 | 2020-11-05 12:46:52+00:00 | 11 |
e9875581b2927af64569c1f48a414d0e4426c0c5 | 2020-11-05 12:46:52+00:00 | 11 |
f321520e1240253c74b8d97d19f87e34d4ef4017 | 2020-11-18 16:07:15+00:00 | 14 |
f321520e1240253c74b8d97d19f87e34d4ef4017 | 2020-11-18 16:07:15+00:00 | 14 |
fe61e9ddb1db59b300c3ecfa0d910d9a08003b34 | 2020-12-16 09:47:41+00:00 | 9 |
fe61e9ddb1db59b300c3ecfa0d910d9a08003b34 | 2020-12-16 09:47:41+00:00 | 9 |
b3241493e2581c847bed65e0870e11b8c80e678e | 2020-12-23 09:21:05+00:00 | 9 |
b3241493e2581c847bed65e0870e11b8c80e678e | 2020-12-23 09:21:05+00:00 | 9 |
f2c33be7ca0fa784ce04b8a05cdcbf09bd7c968a | 2020-12-28 21:16:31+00:00 | 9 |
f2c33be7ca0fa784ce04b8a05cdcbf09bd7c968a | 2020-12-28 21:16:31+00:00 | 9 |
969718267981738e503f874b1e0554c5446cf559 | 2020-12-28 21:27:28+00:00 | 9 |
969718267981738e503f874b1e0554c5446cf559 | 2020-12-28 21:27:28+00:00 | 9 |
2488b7defbd3d753dd5fcfc890fc4a7e79d25103 | 2021-01-19 08:34:59+00:00 | 9 |
2488b7defbd3d753dd5fcfc890fc4a7e79d25103 | 2021-01-19 08:34:59+00:00 | 9 |
7d14f11ace7bbae32f0e6d2acdfb84b68b98ae10 | 2021-01-19 09:07:58+00:00 | 9 |
7d14f11ace7bbae32f0e6d2acdfb84b68b98ae10 | 2021-01-19 09:07:58+00:00 | 9 |
ab928989b450e1756419427739b751bebb3f3603 | 2021-01-19 12:10:44+00:00 | 9 |
ab928989b450e1756419427739b751bebb3f3603 | 2021-01-19 12:10:44+00:00 | 9 |
d70d8942413eff942f4a908263a3f0978894a9e5 | 2021-01-24 15:45:27+00:00 | 10 |
d70d8942413eff942f4a908263a3f0978894a9e5 | 2021-01-24 15:45:27+00:00 | 10 |
c4a64251673856858938e3ae2becede8278a245e | 2021-01-25 12:24:51+00:00 | 11 |
c4a64251673856858938e3ae2becede8278a245e | 2021-01-25 12:24:51+00:00 | 11 |
8a203b7f8f265bf11a61cded55f94e32c836b95d | 2021-01-26 10:34:29+00:00 | 8 |
8a203b7f8f265bf11a61cded55f94e32c836b95d | 2021-01-26 10:34:29+00:00 | 8 |
270bf6c60de2c9f16aa32c3eeb6de39f0ee0e734 | 2021-01-26 10:45:51+00:00 | 9 |
270bf6c60de2c9f16aa32c3eeb6de39f0ee0e734 | 2021-01-26 10:45:51+00:00 | 9 |
e32af5fa53a91b79e2d14a5b28753d6120800927 | 2021-03-09 18:40:28+00:00 | 9 |
e32af5fa53a91b79e2d14a5b28753d6120800927 | 2021-03-09 18:40:28+00:00 | 9 |
65ca2db6ca13110426ae9fd0f359d3063d28535d | 2021-03-09 20:24:44+00:00 | 8 |
65ca2db6ca13110426ae9fd0f359d3063d28535d | 2021-03-09 20:24:44+00:00 | 8 |
f2d189a2076012b35e930db816c307d9a20e3228 | 2021-03-10 14:25:33+00:00 | 10 |
f2d189a2076012b35e930db816c307d9a20e3228 | 2021-03-10 14:25:33+00:00 | 10 |
82bf21cafa2f76719dda253636a26e1cc9e996f8 | 2021-03-11 19:39:37+00:00 | 10 |
82bf21cafa2f76719dda253636a26e1cc9e996f8 | 2021-03-11 19:39:37+00:00 | 10 |
579eec63462d9e6987c6514231c2ca78fca85547 | 2021-03-22 03:29:56+00:00 | 8 |
579eec63462d9e6987c6514231c2ca78fca85547 | 2021-03-22 03:29:56+00:00 | 8 |
f81cf47b29f0038658b22dfba7a98368d0a93089 | 2021-03-22 08:51:40+00:00 | 12 |
f81cf47b29f0038658b22dfba7a98368d0a93089 | 2021-03-22 08:51:40+00:00 | 12 |
199f69fb1d9d71ae8bb74d44424a77e6a9f9deaf | 2021-03-22 08:53:01+00:00 | 12 |
199f69fb1d9d71ae8bb74d44424a77e6a9f9deaf | 2021-03-22 08:53:01+00:00 | 12 |
ce961a835912e807dc7da32652c34daadcc89385 | 2021-03-22 08:55:03+00:00 | 11 |
ce961a835912e807dc7da32652c34daadcc89385 | 2021-03-22 08:55:03+00:00 | 11 |
925ac87225cf1e303b180e35a59809ac9db2e3b2 | 2021-03-22 09:00:27+00:00 | 9 |
925ac87225cf1e303b180e35a59809ac9db2e3b2 | 2021-03-22 09:00:27+00:00 | 9 |
c36390f43f7394c65623782ef96904cfc3cd6a93 | 2021-03-25 08:51:56+00:00 | 13 |
c36390f43f7394c65623782ef96904cfc3cd6a93 | 2021-03-25 08:51:56+00:00 | 13 |
fd1709c6e91c2c508c2896b8e8857e0009c37f68 | 2021-03-25 09:01:57+00:00 | 9 |
fd1709c6e91c2c508c2896b8e8857e0009c37f68 | 2021-03-25 09:01:57+00:00 | 9 |
d56e601f83a4d4cdd7ef8d5d44d9b7efd6b1adf9 | 2021-03-26 09:54:10+00:00 | 9 |
d56e601f83a4d4cdd7ef8d5d44d9b7efd6b1adf9 | 2021-03-26 09:54:10+00:00 | 9 |
990853006311b938487842ea65998f7d473fd674 | 2021-03-27 17:50:44+00:00 | 9 |
990853006311b938487842ea65998f7d473fd674 | 2021-03-27 17:50:44+00:00 | 9 |
b822548e56ff1ccc83ba71e1e03923cbdae8377a | 2021-03-27 23:17:46+00:00 | 12 |
b822548e56ff1ccc83ba71e1e03923cbdae8377a | 2021-03-27 23:17:46+00:00 | 12 |
f637474f204b10785860a59c2ddae016959200fc | 2021-03-31 20:48:27+00:00 | 12 |
f637474f204b10785860a59c2ddae016959200fc | 2021-03-31 20:48:27+00:00 | 12 |
85d0b9ce19f5b62ac8db7e41554825f587312316 | 2021-04-01 20:11:30+00:00 | 10 |
85d0b9ce19f5b62ac8db7e41554825f587312316 | 2021-04-01 20:11:30+00:00 | 10 |
3683cf69871f9f109620e335828dbf0e29f9c151 | 2021-04-03 13:04:50+00:00 | 11 |
3683cf69871f9f109620e335828dbf0e29f9c151 | 2021-04-03 13:04:50+00:00 | 11 |
013d4b801a64be2cef695957c06ba1d9cb4712ff | 2021-04-04 18:45:28+00:00 | 11 |
013d4b801a64be2cef695957c06ba1d9cb4712ff | 2021-04-04 18:45:28+00:00 | 11 |
3b45a37f2bedb4b2d9aed352d8cccc5b36ab5dc7 | 2021-04-04 22:02:42+00:00 | 11 |
3b45a37f2bedb4b2d9aed352d8cccc5b36ab5dc7 | 2021-04-04 22:02:42+00:00 | 11 |
f1e4d915628fb7dee6c8562c79455d86bd94e157 | 2021-04-09 12:38:28+00:00 | 10 |
f1e4d915628fb7dee6c8562c79455d86bd94e157 | 2021-04-09 12:38:28+00:00 | 10 |
fea9cdcb75d358d9c22f32493300e3c6b6f39ce5 | 2021-04-09 09:01:24-04:00 | 11 |
fea9cdcb75d358d9c22f32493300e3c6b6f39ce5 | 2021-04-09 09:01:24-04:00 | 11 |
25ca6751a99bbb6b267efc898d54fe0c75d53f41 | 2021-04-09 09:24:41-04:00 | 14 |
25ca6751a99bbb6b267efc898d54fe0c75d53f41 | 2021-04-09 09:24:41-04:00 | 14 |
a9576511cc443aab6fc51a914a58cdaa35077ec1 | 2021-04-09 17:19:40-04:00 | 15 |
a9576511cc443aab6fc51a914a58cdaa35077ec1 | 2021-04-09 17:19:40-04:00 | 15 |
14ad42d3afc6d17d5ddf1408ce2a8bc9871b58e4 | 2021-04-09 17:24:29-04:00 | 13 |
14ad42d3afc6d17d5ddf1408ce2a8bc9871b58e4 | 2021-04-09 17:24:29-04:00 | 13 |
I think our demo docs are a bit too small to really find the culprit. I am currently building the pandas docs locally under a profiler, will report back here in a bit (it takes a while to build though .. ;))
So I build a subset of the pandas API docs (removing the narrative user guide, as the slowdown comes from the writing phase, and it's the API docs that has many pages): https://gist.githubusercontent.com/jorisvandenbossche/f5ff72ee2eea52c30193abc2e9b5cd05/raw/bcc68040a7b3691e58828bf2dddaaed7d9866f57/profile-pandas-docs.svg
Most of the time is spent in generate_nav_html
(more than 80% in this case). Digging deeper, around 30% is spent in resolve
from sphinx (this has increased, because with collapse=False
, the size of the toctree to resolve has become much bigger). There is certainly a significant part spent in bs4
as well (but would need to compare with lxml how much that can be reduced, however I assume that part of this is also simply due to the larger HTML size of the resulting pages that gets parsed)
And the version with using the lxml parser through bs4: https://gist.githubusercontent.com/jorisvandenbossche/8aab410b0231a74d755ed54e656e5b7c/raw/8720e2df4426088483b6b775bac982cdb35bdc19/profile-pandas-docs-lxml.svg
(for a big site like pandas, this doesn't seem to make much difference)
We probably want to include a configuration option like readthedocs theme has: collapse_navigation
: https://sphinx-rtd-theme.readthedocs.io/en/latest/configuring.html#confval-collapse_navigation
Well, if the lxml thing is a red herring (or maybe i don't have a handle on how to interpret the profiling): can more things be cached along the way? I don't know enough about the toctree data structure, but seems like it would be possible to generate The Tree, and then slice off the pieces needed per page?
Yeah, I think sphinx is definitely doing a lot of duplicated effort .. However, to do that on our side might require some deeper plumbing into sphinx.
For example, a large part of the time is spent in this resolve
call:
The toctree it it is resolving is the same in many cases, but each time (for each page), the docname
is different, i.e. each call of this function is slightly different. So we can't easily "cache" it on our side.
I think the only difference between resolve
calls for pages that have the same root index is that the "current" class in the HTML is tagged onto a different item in the navigation list. That's of course a tiny difference for repeating the expensive operation .. So we could maybe think about resolving it for the root index once and then add the "current" tags ourselves with some HTML bs4 manipulation.
Same problem. The average building time increases from less than 10min to about 15min.
writing output... [ 97%]
writing output... [ 97%]
waiting for workers... <- that takes a long time :( Does anybody know it is waiting for what exactly?
generating indices... genindex py-modindex done
copying notebooks ... [100%]
highlighting module code... [100%]
writing additional pages... search done
Seems that generating the collapsible sidebar spends a lot of time?
Updated: 15min used in my 8core 16g machine. But for GitHub Actions (Ubuntu 2core 8g),
now is over 1 hour.., https://github.com/MegEngine/Documentation/runs/2619670415
It's hard to accept. :(
Updated again, the build artifact now is over 1.2G, most from API HTML files, and each file has over 10000 lines.
In case anyone lands here and is looking for a workaround, the instructions in https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar helped a lot for our project that uses sphinx-book-theme (which in turn uses pydata-sphinx-theme) and has many autogenerated API docs.
(I found this issue by profiling my sphinx-build command using cprofile and identifying that generate_nav_html
took a very large fraction of the total build time.)
Ah thanks for linking that @hawkinsp - that section was added to address this issue, so I think that we can close this one and I'll update the top comment with a link to that section
See https://github.com/numpy/numpy/pull/18756. Build times went from around 10 minutes to 30+ on CircleCI.
Things to try
toctree
one time per page, and split it into two items for the navbar/sidebarTo resolve this
We decided that the slowdown here is somewhat unavoidable, as long as you want to keep multiple levels within your navigation bar. For some tips about speeding up site builds, see: https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar