python / docsbuild-scripts

scripts for building documentation on docs.python.org
64 stars 58 forks source link

Build docs with fresh CPython commit not ~24h old one #190

Closed hugovk closed 2 months ago

hugovk commented 3 months ago

A full build of all languages / versions has been taking somewhere between 24 and 50 hours (https://github.com/python/docsbuild-scripts/issues/169, but https://github.com/python/cpython/pull/123113 should cut about a third).

We update the CPython repo once at the start, then loop each language/version combo:

https://github.com/python/docsbuild-scripts/blob/56d72d43e5759cc0ed600827b56e81d8310bcaca/build_docs.py#L1120-L1135

This means builds near the end of the loop will be using a Git commit which could be a day or two old.

For example, looking at the current logs:

15502:2024-08-24 16:07:01,574 DEBUG: Run: 'git -C /srv/docsbuild/cpython fetch'

It's currently 2024-08-25 10:45, meaning current builds are using an 18-hour-old commit, and we're about half way through a full build.

So let's instead update the CPython repo before each language/version, perhaps by moving the update inside the while loop:

     cpython_repo = Repository(
         "https://github.com/python/cpython.git", args.build_root / "cpython"
     )
-    cpython_repo.update()
     while todo:
         version, language = todo.pop()
         logging.root.handlers[0].setFormatter(
                 f"%(asctime)s %(levelname)s {language.tag}/{version.name}: %(message)s"
             )
         )
         if sentry_sdk:
             with sentry_sdk.configure_scope() as scope:
                 scope.set_tag("version", version.name)
                 scope.set_tag("language", language.tag)
+        cpython_repo.update()
         builder = DocBuilder(
             version, versions, language, languages, cpython_repo, **vars(args)
         )
ned-deily commented 2 months ago

It might also make sense to reverse the order of builds so that the newest releases (where there is the most immediate interest and churn) will be built first.

hugovk commented 2 months ago

They're already being run in reverse:

diff --git a/build_docs.py b/build_docs.py
index 93dcac4..1756c2b 100755
--- a/build_docs.py
+++ b/build_docs.py
@@ -1117,7 +1117,7 @@ def build_docs(args) -> bool:
     cpython_repo = Repository(
         "https://github.com/python/cpython.git", args.build_root / "cpython"
     )
-    cpython_repo.update()
+    # cpython_repo.update()
     while todo:
         version, language = todo.pop()
         logging.root.handlers[0].setFormatter(
@@ -1125,6 +1125,8 @@ def build_docs(args) -> bool:
                 f"%(asctime)s %(levelname)s {language.tag}/{version.name}: %(message)s"
             )
         )
+        print(f"{version.name}/{language.tag}")
+        continue
         if sentry_sdk:
             with sentry_sdk.configure_scope() as scope:
                 scope.set_tag("version", version.name)
@@ -1136,6 +1138,7 @@ def build_docs(args) -> bool:
     logging.root.handlers[0].setFormatter(
         logging.Formatter("%(asctime)s %(levelname)s: %(message)s")
     )
+    sys.exit()

     build_sitemap(versions, languages, args.www_root, args.group)
     build_404(args.www_root, args.group)
3.14/zh-tw
3.14/zh-cn
3.14/uk
3.14/tr
3.14/pt-br
3.14/pl
3.14/ko
3.14/ja
3.14/it
3.14/id
3.14/fr
3.14/es
3.14/en
3.13/zh-tw
3.13/zh-cn
3.13/uk
3.13/tr
3.13/pt-br
3.13/pl
3.13/ko
3.13/ja
3.13/it
3.13/id
3.13/fr
3.13/es
3.13/en
3.12/zh-tw
3.12/zh-cn
3.12/uk
3.12/tr
3.12/pt-br
3.12/pl
3.12/ko
3.12/ja
3.12/it
3.12/id
3.12/fr
3.12/es
3.12/en

It's because we sort the versions from lowest to highest, but then pop()s from the list.