Closed hugovk closed 2 months ago
The improvement is even larger on the docs server. For en/3.14
: 57s -> 3.5s 🚀
64871:2024-09-17 12:32:39,535 DEBUG en/3.14: Purging https://docs.python.org/dev/ from CDN
64872:2024-09-17 12:32:39,538 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
64873:2024-09-17 12:32:39,580 DEBUG en/3.14: https://docs.python.org:443 "PURGE /dev/ HTTP/1.1" 200 52
64874:2024-09-17 12:32:39,582 DEBUG en/3.14: Purging https://docs.python.org/3.14/ from CDN
64875:2024-09-17 12:32:39,583 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
64876:2024-09-17 12:32:39,624 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/ HTTP/1.1" 200 53
64877:2024-09-17 12:32:39,626 DEBUG en/3.14: Purging https://docs.python.org/dev/.buildinfo from CDN
64878:2024-09-17 12:32:39,627 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
64879:2024-09-17 12:32:39,668 DEBUG en/3.14: https://docs.python.org:443 "PURGE /dev/.buildinfo HTTP/1.1" 200 52
64880:2024-09-17 12:32:39,670 DEBUG en/3.14: Purging https://docs.python.org/dev/404.html from CDN
[snip 3328 lines]
68209:2024-09-17 12:33:36,345 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-pdf-a4.tar.bz2 HTTP/1.1" 200 52
68210:2024-09-17 12:33:36,347 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-html.tar.bz2 from CDN
68211:2024-09-17 12:33:36,348 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
68212:2024-09-17 12:33:36,389 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-html.tar.bz2 HTTP/1.1" 200 52
68213:2024-09-17 12:33:36,391 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-text.tar.bz2 from CDN
68214:2024-09-17 12:33:36,392 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
68215:2024-09-17 12:33:36,432 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-text.tar.bz2 HTTP/1.1" 200 50
68216:2024-09-17 12:33:36,434 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs.epub from CDN
68217:2024-09-17 12:33:36,436 DEBUG en/3.14: Starting new HTTPS connection (1): docs.python.org:443
68218:2024-09-17 12:33:36,493 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs.epub HTTP/1.1" 200 52
36729:2024-09-18 10:39:56,295 DEBUG en/3.14: Purging https://docs.python.org/dev/ from CDN
36730:2024-09-18 10:39:56,296 DEBUG en/3.14: Resetting dropped connection: docs.python.org
36731:2024-09-18 10:39:56,343 DEBUG en/3.14: https://docs.python.org:443 "PURGE /dev/ HTTP/11" 200 0
36732:2024-09-18 10:39:56,344 DEBUG en/3.14: Purging https://docs.python.org/3.14/ from CDN
36733:2024-09-18 10:39:56,347 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/ HTTP/11" 200 0
36734:2024-09-18 10:39:56,347 DEBUG en/3.14: Purging https://docs.python.org/dev/.buildinfo from CDN
36735:2024-09-18 10:39:56,350 DEBUG en/3.14: https://docs.python.org:443 "PURGE /dev/.buildinfo HTTP/11" 200 0
36736:2024-09-18 10:39:56,350 DEBUG en/3.14: Purging https://docs.python.org/dev/404.html from CDN
36737:2024-09-18 10:39:56,353 DEBUG en/3.14: https://docs.python.org:443 "PURGE /dev/404.html HTTP/11" 200 0
36738:2024-09-18 10:39:56,353 DEBUG en/3.14: Purging https://docs.python.org/dev/about.html from CDN
[snip 2221]
38960:2024-09-18 10:39:59,833 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-texinfo.zip from CDN
38961:2024-09-18 10:39:59,835 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-texinfo.zip HTTP/11" 200 0
38962:2024-09-18 10:39:59,835 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-pdf-a4.tar.bz2 from CDN
38963:2024-09-18 10:39:59,837 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-pdf-a4.tar.bz2 HTTP/11" 200 0
38964:2024-09-18 10:39:59,837 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-html.tar.bz2 from CDN
38965:2024-09-18 10:39:59,840 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-html.tar.bz2 HTTP/11" 200 0
38966:2024-09-18 10:39:59,840 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs-text.tar.bz2 from CDN
38967:2024-09-18 10:39:59,842 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs-text.tar.bz2 HTTP/11" 200 0
38968:2024-09-18 10:39:59,842 DEBUG en/3.14: Purging https://docs.python.org/3.14/archives/python-3.14.0a0-docs.epub from CDN
38969:2024-09-18 10:39:59,846 DEBUG en/3.14: https://docs.python.org:443 "PURGE /3.14/archives/python-3.14.0a0-docs.epub HTTP/11" 200 0
Adding this up, 12 languages 3 versions (57-3.5) seconds = ~32 minutes saved for a full build loop. :tada:
Helps https://github.com/python/docsbuild-scripts/issues/169.
Looking at the server logs (see en-3.12.log) for building the
en/3.12
documentation, I noticed that during the publishing stage, there are 1,612 instances of purging URLs from the CDN. It does both the/3/
and/3.12/
URLs.For example, it begins:
And ends:
This takes 2 minutes and 44 seconds.
Each
purge()
command is callingrequests.request()
, which involves starting new HTTPS connections for each and every URL. The "Starting new HTTPS connection" comes from urllib3, the underlying HTTP client used by Requests.We can do better by creating a Requests "session" which will open a single connection, and re-use it by calling
session.request()
inpurge()
:https://requests.readthedocs.io/en/latest/user/advanced/
Or, because Requests is essentially wrapping the urllib3 HTTP client, we might as well use that urllib3 directly instead of Requests. The API is virtually the same.
Comparison
Re-using a connection makes it about three times quicker on my machine.
Timings from my Mac using the following test scripts.
Original
purge()
: 33 secondsWith
requests.Session()
: 11 secondsOutput is like:
With
urllib3.PoolManager()
: 10 secondsOutput is like: