ocurrent / ocaml-docs-ci

CI building documentation for ALL versions of ALL packages !
https://docs.ci.ocaml.org
MIT License
17 stars 14 forks source link

reduce/optimize the number of universes (DO NOT MERGE) #171

Closed moyodiallo closed 6 months ago

moyodiallo commented 8 months ago

This PR is not intended to be merged. This is an illustration of how far can we optimize to have less universes/jobs to deal with.

The idea can be break down in 2 stage:

  1. Group all the packages by their respective repo and version before scheduling(finding the minimal group that include all the opam packages) them. It is The solved solved success in the statistics.

  2. There are lot of packages that doesn't change their dependencies so often between 2 different versions. So we could reduce the jobs number considering their shared dependencies. It is Reduced jobs in the statistics.

The statistics shows how interesting could be, to optimize ocaml-docs-ci universes. The command to start the CI to get the stats.

$ dune exec -- stats ../opam-repository
Total number of opam packages is 30228.

2024-03-28 17:18.14: New job: 5e59e8699a1534a5e3d22936ffcba8fc5ffee950
2024-03-28 17:18.14: opam-repository commit : 5e59e8699a1534a5e3d22936ffcba8fc5ffee950
2024-03-28 17:18.14: 
2024-03-28 17:18.14: Ungrouped packages (default)
2024-03-28 17:18.22: default> The solved jobs success    : 25340
2024-03-28 17:18.22: default> The opam packages success  : 25363
2024-03-28 17:18.22: default> The opam packages failures : 4865
2024-03-28 17:18.22: default> Scheduled jobs             : 17494
2024-03-28 17:18.22: default> Coverage                   : 25363/30228 (83.91%)
...
2024-03-28 17:18.22: 
2024-03-28 17:18.22: Grouped packages by repo
2024-03-28 17:18.26: grouped> The solved jobs success    : 13627
2024-03-28 17:18.26: grouped> The opam packages success  : 24202
2024-03-28 17:18.26: grouped> The opam packages failures : 6026
2024-03-28 17:18.26: grouped> Scheduled jobs             : 11394
2024-03-28 17:18.26: grouped> Coverage                   : 24202/30228 (80.06%)
2024-03-28 17:18.26: grouped> Reduced jobs (by deps hashes) : 6343
2024-03-28 17:18.31: 
2024-03-28 17:18.31: Grouped packages by repo (take the failures that are solved individualy)
2024-03-28 17:18.31: group-fixed> The solved jobs success    : 14797
2024-03-28 17:18.31: group-fixed> The opam packages success  : 25373
2024-03-28 17:18.31: group-fixed> The opam packages failures : 4855
2024-03-28 17:18.31: group-fixed> Scheduled jobs             : 11473
2024-03-28 17:18.31: group-fixed> Coverage                   : 25373/30228 (83.94%)
2024-03-28 17:18.31: group-fixed> Reduced jobs (by deps hashes) : 6425
2024-03-28 17:18.31: Job succeeded

Some indication, to understand the statistics.

Improvements: The comparison between group-fixed and default:

This PR only implement, "Scheduled jobs" of grouped. Adding the "Reduced jobs" (2nd stage) is more complicated, it demands changing how the jobs are built/installed it will ends up dissociating a package with it dependencies during the build/install. In the default case, a package and its dependencies are built/installed once together, this is why the duplication.

shonfeder commented 6 months ago

We don't have bandwidth or priority to move forward with this at the moment, but we will keep this POC in mind for reference in the future!