Prototype 1: building from a Dune lockfile

Goal: with just the opam binary, dune build builds projects "from scratch", relying on no external state or OCaml-based tools.

Overview

The opam binary is being used here rather than attempting in this prototype to use opam's libraries. The default opam root (~/.opam) will not be used at all by this prototype, which should initialise a hidden opam root internally (it will therefore necessarily be slow).

This prototype won't look to address UX issues in Dune (selection of configuration options, etc.). The aim is rather to look at what changes are needed in dune-project, dune-workspace and the lockfile itself.

There's no mapping yet between Dune public names and opam packages, the packages which are required are implicitly in the project's opam files (which may of course be being generated from dune-project).

Implicit state to `dune-workspace`; defaults to `dune-project`

Dune obtains a lot of implicit state from the environment when it starts. This needs to be re-homed, and suitable default inferred. Two major items need to be addressed:

Compiler version selection. At present, the compiler version is implicitly taken from the build context. This should probably be surfaced as an alternative to (switch opam-switch-name) with the compiler version instead specified e.g. (context (opam (ocaml 5.0))). As a simplification at this stage, it may well be sensible to require a full version specification (i.e. (ocaml 5.0.0)) although the final target would be to be able to specify the series only (and have the lockfile do the rest - i.e. the workspace declares that OCaml 4.14.x is in use and the lockfile guides Dune towards whether that means 4.14.0 or 4.14.1). This is also where compiler options would also be specified (e.g. flambda) but that's beyond the scope of this prototype.
Optional tools/package selection. Anything which the dune files would interpret based on the environment needs to have a mechanism for being declared - this includes (select and (enabled-if fields and any stanzas which activate only on the presence of tools found in the environment (odoc, mdx, etc.). We also need to consider what the default behaviour should be for these. These will probably require extra fields in the (context stanza to specify what is wanted and may also want changes to the dune stanzas (for example, (select may want to be augmented with a (default) flag).

dune-workspace is the obvious home for configuration options which may need to be changed, but which should not necessarily be committed to a repository. Defaults may perhaps need to go in dune-project, for example, as to whether test dependencies should be built by default, the default contexts to use if there is no dune-workspace file (a repository might use dune-project to indicate that dune-workspace should be initialised to test OCaml 4.08 and OCaml 5.0, still giving the user the ability to add additional build contexts to a specific checkout of the repository).

Lockfile design

We have found when discussing lockfiles that even determining exactly what is meant by a lockfile can take some agreement! At the moment, in OCaml, lockfiles principally refer to a single set of solved dependencies. This works up to a point, but it has a serious limitation - there are no non-trivial OCaml projects where exactly one lockfile describes all possible uses:

Very few, if any, OCaml libraries are designed to be used with a single version only of the compiler, so single-configuration lockfiles are virtually unused in this context.
opam's conf-* packages partially mask the effect of different operating systems on dependencies by providing a common central package for OS-specific dependencies, but this only works up to a point. The eio package, for example, has different opam dependencies for each of the major operating systems.
Dependencies of an optional component can affect the dependencies of other components. For example, Lwt's lwt_domain package logically belongs in the same repository as the rest of Lwt, but locking dependencies to allow lwt_domain to be included would artifically constrain all of Lwt to OCaml 5.0+.

A set of opam packages at specific versions, therefore, constitutes merely the simplest of lockfiles. This simple lockfile provides repeatability for configuration of the lockfile.

What is wanted, therefore, is a lockfile for all the possible configurations for a given project. For compiler and OS versions, we could almost consider creating these lockfiles up front, but this quickly falls down in the face of optional components within a project, as the number of "lockfiles" soon becomes unmanageable.

If repeatability for any given configuration is the goal, then the perfect lockfile contains just enough information to allow this. To this end the lockfile requires:

The list of opam-repository remotes in use, in priority order, and their HEAD commits
The list of manually pinned dependencies, and their HEAD commits
The constraint solver and criteria

Note that 1 and 2 together with the project's own opam files describe a single universe of opam packages. When building in Dune, the packages themselves are implicitly "pinned" (i.e. solving is done using the package opam files in the Dune workspace, and all those in repositories are overridden). If we assume a stable constraint solver, then each individual lockfile configuration is simply a solving request against this universe of opam packages. The configuration itself is a consequence of the computer running the build (for example, providing that the architecture is arm64 and the OS is macOS) and things flowing from the build itself (for example, the build context gives the constraint required for the "ocaml" package; the presence of an mdx stanza adds the requirement for the "mdx" package to be available to provide the tool; etc.)

To this we must add a small seasoning of reality. Our solvers are occasionally slow and they are also occasionally unstable. Both of these should be regarded as bugs, but it means that to the minimal lockfile, we should add the ability to cache the results of commonly required solves. In this case, solving for a given configuration involves first checking if the configuration is cached in the lockfile and only beginning an actual package solve if the configuration differs. In just the same way as Dune's cache can perform statistical checks for reproducibility, one could also conceive of configuring Dune to test lockfile caches.

Dune's lockfile is therefore going to look something like:

; Two opam-repository selections; lowest priority given first
(repositories
  (default git+https://github.com/ocaml/opam-repository.git#master a9fb5a379794b0d5d7f663ff3a3bed5d4672a5d3)
  (coq-released git+https://github.com/coq/opam-coq-archive.git/released#master b4a388d3b07a30cd0ac7b5c98ea8a82c7fc89eea))

; Pinned packages, with (potentially shared) sources
(pins
  ((eio.0.8 eio_linux.0.8) git+https://github.com/ocaml-multicore/eio.git#main 80f5352526b260e9c32d7110844c7e7408e905f4))

; Cached configurations. In this case the cache entries specify the values of
; "free" solver variables (os, in this instance, comes from the machine running
; Dune) and configuration variables (ocaml, in this case comes from
; dune-workspace). Note when generating the cache entries that it would be
; necessary to capture all _relevant_ solver variables. For example, if the
; packages in the universe use `os-distribution` in any dependency formula,
; then its value would be captured as part of the configuration.
(configurations
  (linux (os "linux") (ocaml 5.0))
  (windows (os "win32") (ocaml 5.0))
  (macos (os "macos") (ocaml 5.0)))

(cache
  ; ocaml-base-compiler.5.0.0 is required for _all_ configurations
  ocaml-base-compiler.5.0.0
  ; ...
  ; eio_linux is only required for the linux configuration
  eio_linux (linux))

It is beyond the scope of this prototype to generate these lockfiles (in particular, the caches may be generated ad hoc using other mechanisms for now). Part of the prototyping work will need to look at whether the location of repositories and pins should be in dune-project with the SHAs only in the lockfile, etc.

Switch creation for tools and the build

Equipped with this enhanced configuration, the prototype should then be in a position to define the opam switch required, rather than detect it. For the purposes of the prototype, opam's command line client should be used to initialise an opam root and local switch inside _build. The build graph will need to ensure that:

The repository list of the opam root matches the lockfile
The internal local switch for the build has the correct pins
The internal local switch's invariant matches the selected configuration
The packages installed in the local switch exactly match those required by the constraints in the lockfile
opam switch create / opam install x y z is run, if needed, to bring the switch into a usable state (it would be fine, for the purposes of the prototype, if the switch is always recreated if the dependencies are changed)

For the purposes of this protoype, no caching or sharing need be attempted. In particular, nothing need be done at this stage to separate dependencies of tools from the project (e.g. ocamlformat, mdx, etc. build as part of the package's dependency cone).

Outcomes

This prototype will be slow to use (since it always builds OCaml as part of the initial build) and will be exposed to the vagaries of wrapping the opam command line tool directly (network access, terminal control, etc.).

What should be demonstrable is taking a project (to be selected, preferably with differing dependencies between Linux/macOS/Windows) and cloning it to Linux/macOS/Windows having only a pair of opam and dune binaries (in particular, with opam not initialised) and demonstrating that dune build in the cloned repo successfully, and consistently, builds the project for that given platform.

A side-note on monorepos and lockfiles. I'd possibly distinguish here a genuine monorepo (such as Jane) vs an assembled monorepo (such as RWO). i.e. Jane is developed as a monorepo, from which some separate packages are carved out by RWO is assembled from external dependencies, and must be updated (at some point) when they change.

If the monorepo is genuinely "mono", then one lockfile should work. However, if one imagines the "disassembled" view of RWO (i.e. RWO with its dependencies installed externally, instead of as a monorepo), even then it ideally would like to be building on Linux / macOS / Windows which implies more than one lockfile, even if it only targets exactly one version of OCaml. It does, of course, get a little fuzzy/grey with monorepos, because of course you can solve Linux / macOS / Windows by having one lockfile which is the union of all the dependencies, even if they can't necessarily all be co-compiled (and won't be, for example, if enabled-if is being used to control OS-specific libraries).

Thanks for taking the initiative on this effort. There's lots of good ideas here, but I'd like to offer an argument for the more traditional (and simplistic) definition of lock files. The definition is indeed simpler and less useful on its own, but I hope to show that its limitation can be lifted with complimentary features rather than modifying what a lock file is.

The definition I'll work with is indeed just a set of package names, their corresponding build metadata, and sources needed to complete the build. Lock files also have a social contract a lock file is published, the build plan derived from the lock file was observed to successfully build the project at some point. Therefore, if one reproduces the environment of the observer, one is guaranteed to successfully build the project.

The difference in our approaches can actually be summarized by the answer to one question: "Should the lock file store the input or the output of the solver?". If we pretend to have the function:

val solve :  repository list
             -> (free_variable * value) list
             -> project
             -> pin list
             -> (package_name * package_metadata * package_source) list

Your approach is to save the input of this function. I'd like to argue for the output instead.

The first benefit of using the output is that this assumption:

If we assume a stable constraint solver

Is no longer necessary. We retain the right to change the implementation details of the solver without compromising the reproducibility of existing plans.

Another benefit is that we avoid spurious changes to the lock file. In your scheme, it's possible to update the commit hash of the opam repositories without actually changing the build plan. That creates unnecessary code review work going through these spurious changes, and also wastes cycles for downstream users pulling metadata they don't actually need. If we hash the output, code review is made easier because the diff shows exactly which packages have been updated. There's also no "junk". We only download what we know we're going to use.

Finally, let me address the question of portability. In your example, the lock file includes valid build plans for 3 different platforms. Presumably, you have the means to test that these build plans actually work on these platforms (otherwise these build plans are just optimistic solver output that has no business being saved anywhere). Therefore, you can use the same means to also generate a lock file for every different platform you have access to. We're still left with some practical problems, but they can be addressed separately:

Isn't there going to be a lot of redundancy between multiple lock files? Not necessarily. We can choose a format that shares as much information as possible
How do we make sure that some constraints are shared between the lock files? Nothing steps us from guiding the solver by modifying the inputs.

I have some other unrelated comments left, but this response is getting long and I figure that it would be better to get the big things out of the way. Will write more later.

@dra27 wrote:

I'd possibly distinguish here a genuine monorepo (such as Jane) vs an assembled monorepo (such as RWO).

I don't think this is a useful distinction, since I went to quite a bit of effort in the early jbuilder/dune porting days to make platform-specific packages build (as dummy packages) on foreign platforms. I don't believe it's good practise to vary the source code for a monorepo based on the current platform, and build tools should do that instead. The reasons for this are myriad: from making cross-compilation easier/possible, to generating SBOMs for the final binaries.

I didn't think about it too much yet but on first glance I think i agree with @rgrinberg. I think storing the output of a solver is simpler, more stable and faster. For the different platforms we could imagine the following "complete" (based on the informations in opam-repository) lock file format (e.g. for eio-ssl.0.1.0, i can't find a better example off the top of my head):

(dependencies
 base-bigarray.base
 base-domains.base
 base-nnp.base
 base-threads.base
 base-unix.base
 bigarray-compat.1.1.0
 bigstringaf.0.9.0
 conf-libssl.4
 conf-pkg-config.2
 csexp.1.5.1
 cstruct.6.1.1
 ctypes.0.20.1
 dune-configurator.3.7.0
 eio.0.8.1
 (eio_linux.0.8.1 (= :os "linux"))
 eio_luv.0.8.1
 eio_main.0.8.1
 fmt.0.9.0
 hmap.0.8.1
 integers.0.7.0
 logs.0.7.0
 luv.0.5.11
 luv_unix.0.5.0
 lwt-dllist.1.0.1
 mtime.2.0.0
 ocaml.5.0.0
 ocaml-config.3
 (ocaml-option-bytecode-only.1 (and (!= :arch "arm64") (!= :arch "x86_64")))
 ocaml-base-compiler.5.0.0
 ocamlbuild.0.14.2
 ocamlfind.1.9.5
 optint.0.3.0
 psq.0.2.1
 result.1.5
 seq.base
 ssl.0.5.13
 stdlib-shims.0.3.0
 topkg.1.0.7
 uring.0.5)

it complicates the solver, especially in the rare case of completely different sets of dependencies based on the platforms but I think this is doable (side note: I'm not a solver expert so I can't tell if this is a known potentially quadratic problem for such a solver), but the good thing about it is you only have to do it once, it's succinct and it should work everywhere.

Sorry for the very lagged reply (and there are likely to be more lags, I'm afraid). I think we're largely arguing in agreement, though - I've used the word "cached", but the intent of the proposal was that we don't store just the inputs, we're storing both the inputs and the outputs for some configurations. The sharing @rgrinberg describes and @kit-ty-kate's example above, look to me simply alternate ways of writing the cache stanza proposed above.

We're possibly interpreting differently what the lockfile being reproducible means - I think you're referring to the build itself being reproducible (i.e. if I set-up the build with the packages listed from the outputs, then it should work). I was referring to the production of the lockfile itself - i.e. the lockfile contains enough information that I should be able to call the solver and produce the same lockfile (which is why I called the "output" a "cache").

The main thing for me (which we all seem in agreement on so far) is that the lockfile is capable of describable multiple builds, which means it's moved beyond what I regard as "simplistic"!

otherwise these build plans are just optimistic solver output that has no business being saved anywhere

I completely agree - there shouldn't be a cache/output list of packages which hasn't been verified in some way. The benefit, as I see it, to the very small extra storage of the inputs is that if two independent users try to use the project on a platform for which there is no cached/stored/whatever lockfile, then there is a chance that they gain the same solution. i.e. they fallback to the same solution. Hypothetically, that means a project published with a Linux-only lockfile but which has different dependencies for macOS/Windows has its users either still installing the same dependencies because they use the same solver and the same repository hashes or installing in the same way. Otherwise, those two users may solve with different repository hashes (based on the date they did the build) and therefore have different outcomes, which is bad.

Another benefit is that we avoid spurious changes to the lock file.

There aren't spurious changes to the lockfile in the propsal above. You'd only change the hash for the repository if the build plan actually alters. Pull the latest versions of the opam repositories (the background equivalent of opam update) would not cause the hash in the lockfile to change.

In the example that Kate brought up the conditional entry is suspect to me:

 (ocaml-option-bytecode-only.1 (and (!= :arch "arm64") (!= :arch "x86_64")))

How can the author perscribe the compiler to use for all other architectures? Has the author of this lock file really tested every other architecture? If this is a property of some package in the project, then it should be specified as a constraint and not in a concrete build plan.

To me, multi platform support consists of two steps:

Generate and verify a build plan for every configuration planned to be supported
Find a minimal representation of the build plans discovered in 1.

This process cannot produce the example above.

Okay, I now understand David's point. I don't see a problem with including the hashes (and urls) of the sources used to generate the lock file. This isn't data that affects the build plan in any way however. But it does seem very useful to be able to work with the original sources the author of the lock file worked with.

tarides / dune