use cache urls from opam config

hannesm commented 2 years ago

fixes #336 (it now respects the archive-mirrors in ~/.opam/config -- thanks to @kit-ty-kate)

Leonidas-from-XIV commented 2 years ago

Hi @hannesm, thanks for the PR. If I understand it correctly, pull_tree will get the tarballs from a different URL in that case using some kind of mapping scheme from the URL to the tarball (I have a hard time finding documentation on archive-mirrors). Can you add an entry to the changelog to explain the change from a user POV? That will also make the CI happy.

I wonder: can you use the archive-mirrors setting to redirect the download to a local folder? That could be extremely useful for the tests, because then we could actually test pull without having to hit the internet, by setting up a mirror and using that to pull the tarballs.

hannesm commented 2 years ago

If I understand it correctly, pull_tree will get the tarballs from a different URL in that case using some kind of mapping scheme from the URL to the tarball

Yes and no, so it'll use the cache offered by opam instead of "only the main source of the tarball". Since the tarballs should have checksums assigned, it shouldn't matter where you get the data from. :) (but in CI systems and in remote locations, using a local mirror // something that doesn't overload github's restrictions is a fine way to save bandwidth)

(I have a hard time finding documentation on archive-mirrors).

So, archive-mirrors can be defined in your opam config (apart from some issues in the 2.1 series, see https://github.com/ocaml/opam/issues/4411 https://github.com/ocaml/opam/pull/4830), and/or in the repo file of the repository (see for documentation https://opam.ocaml.org/doc/Manual.html#repofield-archive-mirrors https://opam.ocaml.org/doc/Manual.html#configfield-archive-mirrors). Opam first tries to download from the archive-mirror (e.g. opam.ocaml.org sets it to cache, so the default cache for the default repository is https://opam.ocaml.org/cache) by using the URL <hash-algo>/<first 2 chars of the hex-encoded hash>/<full hex-encoded hash>.

I wonder: can you use the archive-mirrors setting to redirect the download to a local folder?

I don't know, try it out.

Leonidas-from-XIV commented 2 years ago

Well, it sounds like a reasonable change (I first was worried that it would rewrite the URLs in the lockfile but as the cache URL switcheroo happens on pull and not lock it still yields reproduceable lockfiles), so if you add a changelog entry I think it is ok to merge.

I'll evaluate whether local URLs can be used.

Leonidas-from-XIV commented 2 years ago

I'll evaluate whether local URLs can be used.

I tested with a local (file system) OPAM repo with

opam-version: "2.0"
archive-mirrors: "cache/"

coupled with a tarball in the location you described and it worked right out of the box. Perfect, this will allow us to test pull in a sandbox environment by just preparing a cache with tarballs.

hannesm commented 2 years ago

Great to hear that @Leonidas-from-XIV. I pushed a changes entry.

Leonidas-from-XIV commented 2 years ago

Thanks for the contribution!

hannesm commented 2 years ago

Sorry to bother you, but it looks like this now leads to opam-monorepo that does not compile anymore, the reason is a bad rebase / conflict between #317 being merged (especially 78000f43267cf035601ef82750042491c4cf8782 "Use an explicit cache to avoid using the global cache").

With the current HEAD and the commit before my change, I understand the codebase even less. Would you mind to clarify why "use an explicit cache" is the desired behaviour? Who's going to support that argument (with a Fpath.t)? Why does opam.mli now contain pull_tree_with_cache and pull_tree, which of the functions is actually used and why? I guess I just do not understand why not to use the opam cache...

Leonidas-from-XIV commented 2 years ago

This commit is to optimize the retrieval of repositories. Originally the way to use custom repos was to opam repo add (and let opam deal with caching it) but people tended to forget it (especially as opam-overlays is often needed) and in some cases people just added it for the run of opam-monorepo, then immediately removing it (I believe that this was the case for mirage at some point).

So x-opam-monorepo-opam-repositories was added which would ignore the repos in the OPAM state (thus improving reproducibility, since the documentation which repo the package info from came from was in the lockfiles) and explicitly specify them from that option. But since we don't want to re-download potentially big git repos like opam-repository every time you run opam-monorepo lock this commit introduced a local cache for these repos that was not part of OPAM.

Leonidas-from-XIV commented 2 years ago

I have created #345 to fix the main branch for now.

hannesm commented 2 years ago

Thanks. I still don't quite understand which code path will be taken at what point and why. My concern is mainly that local caches are ignored, although they're present in ~/.opam/config. Given that the workflow of mirage indeed is to opam repo add when opam-monorepo is being called (and opam repo remove afterwards), I guess that will be ok.

Good news, I tested your PR with mirage 4.2.1 and it uses the archive-mirrors cache that I have in my ~/.opam/config.

Leonidas-from-XIV commented 2 years ago

The local cache will be ignored when locking when you specify x-opam-monorepo-opam-repositories when locking, as it will clone the git repos into its own cache directory.

Generally there are two modes: "use the OPAM state" and "configure stuff in opam-monorepo", whereas we try to push people to the latter, since that will yield more reproducible lock files, since the OPAM state can't be easily controlled whereas you can specify a revision of the git repos in x-opam-monorepo-opam-repositories.

tarides / opam-monorepo

use cache urls from opam config #337