Closed hannesm closed 2 years ago
Hi @hannesm, thanks for the PR. If I understand it correctly, pull_tree
will get the tarballs from a different URL in that case using some kind of mapping scheme from the URL to the tarball (I have a hard time finding documentation on archive-mirrors
). Can you add an entry to the changelog to explain the change from a user POV? That will also make the CI happy.
I wonder: can you use the archive-mirrors
setting to redirect the download to a local folder? That could be extremely useful for the tests, because then we could actually test pull
without having to hit the internet, by setting up a mirror and using that to pull the tarballs.
If I understand it correctly, pull_tree will get the tarballs from a different URL in that case using some kind of mapping scheme from the URL to the tarball
Yes and no, so it'll use the cache offered by opam instead of "only the main source of the tarball". Since the tarballs should have checksums assigned, it shouldn't matter where you get the data from. :) (but in CI systems and in remote locations, using a local mirror // something that doesn't overload github's restrictions is a fine way to save bandwidth)
(I have a hard time finding documentation on
archive-mirrors
).
So, archive-mirrors
can be defined in your opam config (apart from some issues in the 2.1 series, see https://github.com/ocaml/opam/issues/4411 https://github.com/ocaml/opam/pull/4830), and/or in the repo
file of the repository (see for documentation
https://opam.ocaml.org/doc/Manual.html#repofield-archive-mirrors https://opam.ocaml.org/doc/Manual.html#configfield-archive-mirrors). Opam first tries to download from the archive-mirror (e.g. opam.ocaml.org sets it to cache
, so the default cache for the default repository is https://opam.ocaml.org/cache) by using the URL <hash-algo>/<first 2 chars of the hex-encoded hash>/<full hex-encoded hash>
.
I wonder: can you use the
archive-mirrors
setting to redirect the download to a local folder?
I don't know, try it out.
Well, it sounds like a reasonable change (I first was worried that it would rewrite the URLs in the lockfile but as the cache URL switcheroo happens on pull
and not lock it still yields reproduceable lockfiles), so if you add a changelog entry I think it is ok to merge.
I'll evaluate whether local URLs can be used.
I'll evaluate whether local URLs can be used.
I tested with a local (file system) OPAM repo with
opam-version: "2.0"
archive-mirrors: "cache/"
coupled with a tarball in the location you described and it worked right out of the box. Perfect, this will allow us to test pull
in a sandbox environment by just preparing a cache with tarballs.
Great to hear that @Leonidas-from-XIV. I pushed a changes entry.
Thanks for the contribution!
Sorry to bother you, but it looks like this now leads to opam-monorepo
that does not compile anymore, the reason is a bad rebase / conflict between #317 being merged (especially 78000f43267cf035601ef82750042491c4cf8782 "Use an explicit cache to avoid using the global cache").
With the current HEAD and the commit before my change, I understand the codebase even less. Would you mind to clarify why "use an explicit cache" is the desired behaviour? Who's going to support that argument (with a Fpath.t)? Why does opam.mli
now contain pull_tree_with_cache
and pull_tree
, which of the functions is actually used and why? I guess I just do not understand why not to use the opam cache...
This commit is to optimize the retrieval of repositories. Originally the way to use custom repos was to opam repo add
(and let opam deal with caching it) but people tended to forget it (especially as opam-overlays
is often needed) and in some cases people just added it for the run of opam-monorepo
, then immediately removing it (I believe that this was the case for mirage at some point).
So x-opam-monorepo-opam-repositories
was added which would ignore the repos in the OPAM state (thus improving reproducibility, since the documentation which repo the package info from came from was in the lockfiles) and explicitly specify them from that option. But since we don't want to re-download potentially big git repos like opam-repository
every time you run opam-monorepo lock
this commit introduced a local cache for these repos that was not part of OPAM.
I have created #345 to fix the main branch for now.
Thanks. I still don't quite understand which code path will be taken at what point and why. My concern is mainly that local caches are ignored, although they're present in ~/.opam/config
. Given that the workflow of mirage indeed is to opam repo add
when opam-monorepo is being called (and opam repo remove
afterwards), I guess that will be ok.
Good news, I tested your PR with mirage 4.2.1 and it uses the archive-mirrors cache that I have in my ~/.opam/config
.
The local cache will be ignored when locking when you specify x-opam-monorepo-opam-repositories
when locking, as it will clone the git repos into its own cache directory.
Generally there are two modes: "use the OPAM state" and "configure stuff in opam-monorepo", whereas we try to push people to the latter, since that will yield more reproducible lock files, since the OPAM state can't be easily controlled whereas you can specify a revision of the git repos in x-opam-monorepo-opam-repositories
.
fixes #336 (it now respects the
archive-mirrors
in~/.opam/config
-- thanks to @kit-ty-kate)