ocaml / opam

opam is a source-based package manager. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow.
https://opam.ocaml.org
Other
1.23k stars 351 forks source link

Reproducible builds #2720

Open AltGr opened 7 years ago

AltGr commented 7 years ago

A feature wish that appeared recently to us but seems of high importance in the industry, and as soon as you want production builds, is reproducible builds.

Note that we speak about a reproducible opam package universe, not binary reproducible builds, for which the OCaml compiler might still miss a bit or two.

This shouldn't be too difficult with our current infrastructure.

Current state

We currently have .export files (opam switch <import|export> <file>) but they appear a bit limited for the task:

The current dev version (2.0~alpha4+) improves on this in several areas

opam internally already computes hashes describing precisely a given version of a given package -- to determine when a package will need to be rebuilt.

Repositories already provide a means of sharing a set of packages with their metadata, and optionally their archives. We could imagine zipping that and providing an easy way to import directly as a new switch.

We can validate a given universe for consistency efficiently and without requiring a call to the solver (thanks to the dose lib), so it would be easy to just check an export file, apply it directly if correct, and warn and ask for directions (apply anyway / call solver / abort) otherwise.

This is something different, but sharing a common configuration and set of repositories is related and also useful for developer teams. opamrc files in opam 2.0 allow that, but at the moment they can only be used at opam init.

Interface

As for the rest, we mainly need to define the best interface for this. It seems safe to assume two operations, one that stores the state of a currently existing switch (optionally including archives), and one that imports that state as a new switch; we want the guarantee that the package universe will be the exact same.

The open questions are:

  1. what is the format used for the exchange ? It should be something that can be versioned.
  2. can we handle sharing updates to the saved state ? (or do we recreate a switch from scratch ?)
  3. how can this work across systems with different constraints (a given state for a platform may be inconsistent in another) ?
  4. how do we handle depexts for importing a state ?

    Prototype workflow

To start with, a command that dumps an opam repository that we can opam repo add later, from a given switch could allow us to prototype further. How about something like this:

# on the original dev machine, from package source dir
$ opam dump-repo [--include-sources]
    # Creates ./opam-repo, containing the metadata of all packages installed
    # in the current switch, plus maybe an export file (stating what is base,
    # root or just installed ?)
$ git add -r opam-repo && git commit etc.

# on mirror dev machines, after git clone
$ opam repo add my-project-repo ./opam-repo --dont-select
$ opam switch import opam-repo/opam.export --switch ./ --repositories ./opam-repo[,default]

Once we settle the shortcuts to setup a local switch from a given package source, the second part of this could be automated as part of it.

hannesm commented 7 years ago

just to add my 2 cents: reproducible binary builds for OCaml 4.03 are there: @gasche merged several patches from Debian, and Xavier fixed http://caml.inria.fr/mantis/view.php?id=7037 -- thus binaries are reproducible (module installation path since OCaml contains its binary location). If you encounter issues (i.e. the OCaml compiler still not producing reproducible builds), I hope we'll be able to fix them upstream.

AltGr commented 7 years ago

@hannesm that's awesome, good to know! It would be very nice to also be rid of the installation paths, since that would allow caching (and moving switches around), but this is already a big step forward.

Drup commented 7 years ago

It would be very nice to also be rid of the installation paths, since that would allow caching (and moving switches around)

I agree ... especially with local switches. It would make short-lived switches usable, and I know many people who would like that.

h01ger commented 5 years ago

dummy comment so github notifies me on progress on this issue without watching the whole repo. sorry for the noise!

mehdid commented 3 years ago

Some good progress has been made here since 2016. Opam 2.0.3 was reproducible. Unfortunately, some changes introduced in later versions broke this nice property. I have tried to make it reproducible. During the build process, it turns out that setting HOME variable to some static setting and using this patch [1] makes most of the issues to go away (they appear only in manpages, which contain paths).

[1] https://sources.debian.org/src/opam/2.0.8-1/debian/patches/0004-Use-HOME-env-variable-instead-of.patch/

It would great if this patch (or a variation of it) is integrated in opam's repo 🙏🏼

There is still one issue though which can be observed here: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/opam.html and which affects opam-admin-cache.1

I didn't want to heavily change the code or the behavior. What would be your recommendation to fix this last issue?

avsm commented 3 years ago

It looks like opam admin cache embeds the cwd in the man page:

       DIR (absent=~/src/git/ocurrent/ocaml-ci/cache)

...we should just change this in the manual page generation to be "absent=" as specifying the exact directory is inaccurate in the event the manual page is saved.

dra27 commented 3 years ago

Thanks @mehdid - I've put your comment in a separate issue as this one's about opam facilitating reproducible builds, rather than itself being built reproducibly.

mehdid commented 3 years ago

Ah, sorry for the confusion and thanks for opening the new issue :-)