Open gasche opened 4 years ago
I experimented with this a while ago (it must be a while now, given that it was with jbuilder!) and in fact that experiment (with an ugly shell script) was what launched the idea of the duniverse tool.
The results are indeed very good - I recall hitting a slight snag to do with checksums for the interfaces which boiled down to the fact that cd bar ; ocamlc -c foo.mli
and ocamlc -c bar/foo.mli
generate different interfaces. That problem may have disappeared, given that the way Dune invokes the compiler has changed a lot since then. At the time, this meant that it worked, but you couldn't install the libraries you'd built.
I haven't dug the results out, but my recollection was that on a 72 core Azure machine, doing opam install patdiff
took 4 minutes but jbuilder build @install
with the ~70 or so packages assembled in a single directory took 1 minute.
For the opam side, this would be configurable with a declared pattern - i.e. you would tag the package as dune'd and then configure opam to understand that it can assemble the trees differently for packages with this tag.
We don't have a spec for this at the moment, though.
Thinking aloud: one aspect that would require a bit of care is how to read back per-package status report from the result of such a combined build; in particular in the case where some of the packages failed to build. If dune does not provide an easy way to delineate individual status reports, a simple approach (in the failure case) would be to just ask dune to rebuild each subproject in turn, in their existing build directories: for each package this would immediately return a package-specific status report.
If I understand correctly, you propose here to try to take advantage of intra-package parallelism, which opam cannot exploit currently (meaning that if package A depends on package B, opam will completely build package B before compiling package A, whereas there might be a file A.foo that only depends on B.foo, and that we could build in parallel with B.bar without waiting that it terminates).
package builds are parallelized, but the amount of parallelism can be somewhat disappointing¹.
This is somewhat orthogonal to the present issue, but indeed, if the overall goal is to improve the speed of opam install
, I believe that there are some low hanging fruits that could help make opam install
faster, independently of the more complicated setup required for exploiting intra-package parallelism.
From what I remember of my investigation a few months ago:
A big source of slowdown was the time spent by opam computing the "changes" files by scanning the filesystem just before and after the installation of a package (this is used to make sure a package can be uninstalled properly). This seems to be quite costly, and 1) useless in the case the installation of the package is driven by a .install file instead of an arbitrary make install
command (and dune generates such .install files); 2) also useless in a CI scenario where we don't care about proper uninstallation.
Also note that scanning the filesystem to compute changes files mean that the "install" actions of packages need to be completely sequentialized! If instead changes files are computed from .install files, then I think we could run the actual "install" operation in parallel for the relevant packages.
I think it would be relatively easy (I hope) to implement 1) as an optimization that we know will trigger for all dune-compiled packages (if .install file exists, compute the changes file from it instead of scanning the disk), and maybe 2) as an option to use in CI scenarios (that would speed up installation of all packages).
https://github.com/ocaml/opam/issues/4245 (noticed separately by me and @emillon !) also seems to be a source of slowness that can be easily optimized, though I'm not sure how much we would gain exactly.
(meaning that if package A depends on package B, opam will completely build package B before compiling package A, whereas there might be a file A.foo that only depends on B.foo, and that we could build in parallel with B.bar without waiting that it terminates).
Yes. I think it is relatively common that a package has base modules with few dependencies, and on the outside (of the dependency graph) more user-facing modules with more third-party dependencies (a testing framework, command-line parsing, etc.). So in the common case you can start building the base modules before many of the dependencies are available.
(Thanks a lot for the brain dump!)
The changes file is a very interesting orthogonal suggestion - it's like the light-uninstall
flag. It would be quite easy to add a flag strict-install
or something meaning use a .install file only and the sandbox could enforce it by making the opam switch read-only for the build.
@dra27 you seem very intent on using flags to customize opam's behavior, while I would more naturally go for detecting from the build rules that the package is just using Dune, and doing the right thing there.
(Ideally there would be a syntax to say build: "dune"
, with the meaning of "this is a standard Dune-using project", instead of letting users each pick their own subset of the standard rules (with or without {with-test}
or {pinned}
; but for pragmatism we can just recognize those subsets of standard rules.)
Could you comment on why you prefer tagging? Here are the points I can come up with, but I don't know that much about opam packaging:
The opam client itself shouldn't over-specialise to dune, so there will be no heuristics in the 2.x series about detecting dune commands and overriding its behaviour.
Instead, I have an opam-monorepo
plugin under preparation that will analyse a given set of opam packages, ensure that they are all a "fully closed set" of dune dependencies, and assemble the sources for a dune build. This is perfectly possible to build using opam-libs and opam-0install-solver today, so it's not blocking on an opam release, nor does it depend on adding new syntax to opam files.
You can find a prototype of this (as a non-opam plugin) at https://github.com/ocamllabs/duniverse if you want to have a play with it now. Note that the duniverse tool will not be released as that tree currently is -- the extra metadata it computes will be folded back into x-
fields and pin-depends
fields in opam, so it'll effectively become a "lock file generator" and the "source assembly plugin". Dune is also not quite ready to work in this mode at scale; it is missing a few features (notably package scoping in the face of vendoring) that means that monorepo builds require some manual curation to work. However, we've been steadily adding features over the last year to work through all these edge cases, so it's making progress. Feel free to try out the duniverse tool if you are interested in working through some of these issues.
against tags: this means that most projects would not benefit from the improvements unless we mass-upgrade the repository, and people would keep submitting non-tagged metadata for their new releases as they just copy their in-repository file and forget to look at in-repo updates.
Mass-upgrade isn't a big deal. As opam-repository maintainers we do that a weekly basis already ^^ It shouldn't be too hard to detect which projects use dune in a standard way and add a tag.
I would personally also be in favour of tagging. It adds the possibility to be backward-compatible and it would be very easy to add to dune's opam file generation.
I'm interested in approaches that do not change the way we use opam now, just make things faster by taking advantage of the fact that many packages use a build system that nicely support composition. Duniverse sounds like an interesting project, but it is more about enabling new workflows, which is not particularly what I'm after here.
That said, @avsm point that the same technical issues would arise in both settings sounds spot on, and I hadn't thought about issues related to vendoring. I suppose that we can have issues if a project has a (vendored) subproject of the same name as another project? Is there a pointer to where these issues are discussed?
. I suppose that we can have issues if a project has a (vendored) subproject of the same name as another project? Is there a pointer to where these issues are discussed?
It's quite a complex series of features, involving the interaction between vendored units (private libraries, public libraries, and binaries), and shifting from the opam solver to discriminate based on build rules (via enabled_if
stanzas for example) :
I'm interested in approaches that do not change the way we use opam now, just make things faster by taking advantage of the fact that many packages use a build system that nicely support composition.
Until duniverse can reliably assemble full dune monorepos, there will be no meaningful progress on the opam side towards a compositional build workflow (with dune at least). For a successful project or two doing this:
It is no coincidence that both of these projects also have a healthy overlap of maintainers of opam and dune :-) It's taken 1.5 years of work between us to get those working, as it's also involved a lot of porting to dune.
But to set realistic expectations, even our core Platform tools do not currently build as a monorepo! But on the other side, we are starting to understand the nature of the scoping needed to support this in dune, but it's not a feature to rush into.
I'm interested in approaches that do not change the way we use opam now, just make things faster
In the short term, you can simply set DUNE_CACHE=enabled
to keep a reliable build cache that will work through opam installs. And while it might not seem perfect now, remember that in just two years we have dramatically improved the speed of a typical opam installation vs the oasis/ocamlbuild combo that was dominant then. The next round of improvements will get us significant improvements again, but there's still work to be done in dune to get us there.
Thanks! One positive thing about this long list of tricky issues is that vendored libraries seem related to how people develop dune-using projects, but are designed to not actually be used once the package is released on opam. This means that focusing on build composition just for faster build at released-package-install-time may be a simpler problem than the general monorepo issue of enabling compositional development.
(For example, if we only aim for an installation-time boost and a particular Dune feature makes things tricky, we can always decide to isolate the build of packages using this feature, never including them in composite clusters.)
each package is built (internally in parallel), and package builds are parallelized, but the amount of parallelism can be somewhat disappointing
Yes, concretely the parallelism plan is fairly "static", if I understand it correctly each package built in parallel gets a fixed number of the total numbers of cores available on the machine. This means that a slow build (or especially a build doing a lot of I/O) can have a huge negative impact.
Make can use a jobserver (a shared semaphore using a pipe) to dynamically track available cores, and I think it can use this protocol to collaborate with other build systems as well. There have been discussions about adding jobserver support to dune, which would help solve this problem. This can also help in the case where dune shells out to another build system, for example when using (run make)
to build a C library - this would share the global parallelism pool.
@gasche wrote:
Thanks! One positive thing about this long list of tricky issues is that vendored libraries seem related to how people develop dune-using projects, but are designed to not actually be used once the package is released on opam. This means that focusing on build composition just for faster build at released-package-install-time may be a simpler problem than the general monorepo issue of enabling compositional development.
I think that's exactly right. One avenue to explore is whether dune-release can create "unvendored" projects for the purposes of opam archives. It still does leave some issues for dune to solve -- for example, ocamlformat and menhir both currently rely on different versions of the fix
package, so they can never colink as a monorepo. However, they work fine in opam since it effectively builds both in a separate sandbox (fix is only used in the menhir binary, not menhirLib).
@emillon wrote:
if I understand it correctly each package built in parallel gets a fixed number of the total numbers of cores available on the machine. This means that a slow build (or especially a build doing a lot of I/O) can have a huge negative impact.
It's worse: both opam and dune autodetect the number of jobs, so you get a multiplicative effect on the number of processes running. The jobserver is the only way to fix this systematically across multiple tools.
A few thoughts about this discussions. First, regarding the job server protocol. The job server protocol would indeed help to improve parallelism when building multiple packages, however it will not perform as well as a single Dune building multiple packages at once. It is easy to see why with an example. Consider a package A that depends on two independent packages B and C. B is small and C is big. When relying on the job server protocol, you have to wait until C finished building before starting the build of A. While with a single Dune instance, you can start building parts of A right from the start. Depending on the set of packages involved, this could range from a small to a big win.
Additionally, the job server protocol was designed for recursive make invocations and there are pitfalls when trying to use it as a general way of sharing parallelism between multiple commands. The most important one is the following: if a command that requested tokens crashes without giving back its tokens, the tokens it requested are lost forever and the rest of the builds will have less jobs available. We found this problem while implementing the job server protocol inside Jane Street, and from searching the web we found that the rust community found the same issue. There doesn't seem to be a solution, so opam would need to trust the command it runs to do the right thing and give back their token even in case of a crash. FTR, the way we solved this problem inside Jane Street is by only using the job server in combination with a custom tool similar to GNU parallel but which supports the job server protocol and always gives back its tokens. This was enough for our use case, but it wouldn't be of much help here.
Independently of speeding up installation, it would be nice to be able to tag packages that use Dune in the standard way to build. This would allow to make the a few things such as the build instructions or some of the package dependencies be an implementation detail of the system. At the moment, users need to use a specific invocation of Dune to build their package with opam and it's easy to get it wrong. Using the opam file generation feature of dune helps, however some of the details are not completely frozen and have evolved over time. So hiding them and making them true implementation details would allow for more flexibility.
@jeremiedimino wrote:
Independently of speeding up installation, it would be nice to be able to tag packages that use Dune in the standard way to build. This would allow to make the a few things such as the build instructions or some of the package dependencies be an implementation detail of the system. At the moment, users need to use a specific invocation of Dune to build their package with opam and it's easy to get it wrong. Using the opam file generation feature of dune helps, however some of the details are not completely frozen and have evolved over time. So hiding them and making them true implementation details would allow for more flexibility.
Rather than remove the build field, it's easier to (and we often do this) backpatch older revisions using tools like opam admin
to reflect some new build instructions. These new instructions often come with different bounds on the client as well, but that's the general approach the opam-repo would prefer. For example, we need to do some patching of jbuilder packages for https://github.com/ocaml/opam-repository/issues/15943, at which point the CI tests will confirm that they continue to build.
The dune rules are already pretty abstracted away thanks to build -p
and the @doc
and runtest
targets. It's also pretty easy to detect dune packages from the dune
dependency and the build rule. If you need more metadata in the opam package, it's fine to add an x-dune-*
field, which other tools can pick up.
There is definitely value in keeping things simple.
A big source of slowdown was the time spent by opam computing the "changes" files by scanning the filesystem just before and after the installation of a package (this is used to make sure a package can be uninstalled properly).
Indeed, just detecting packages that don't have install:
instructions and computing their changes from the .install files could be a significant gain (no need even to make the switch dir read-only, the absence of the install: field is enough!)
About parallelism with dune, I concur with @jeremiedimino :
"dune" "build" "-j" jobs
is — at the moment — the former) ; at best, with a job server, but that would only partially help as was pointed out
Dune is very good at parallelizing work across several (sub)projects, but currently opam does not benefit from this: each package is built (internally in parallel), and package builds are parallelized, but the amount of parallelism can be somewhat disappointing¹.
¹: I haven't seen a precise analysis of the causes, but one reason may be large packages with lots of dependencies that act as parallelism bottlenecks. This is something that @Armael looked at in the context of whole-repo builds, and @kit-ty-kate runs several opam installs in parallel (in independent docker containers) for her whole builds to get better parallelism.
A natural idea would for opam to take advantage of dune's cross-package parallelism in a multi-package install by:
I'm sure that this has already been proposed and discussed. What is the status? Are some people planning to do it, but they haven't had time yet, or maybe there is a blocker when we look at the details?
(One aspect I don't understand very well is if we can use cross-package build "for free" for packages that use Dune, or if we have a problem if they each specify various different options in their "build" rules.)