ocaml / infrastructure

WIki to hold the information about the machine resources available to OCaml.org
40 stars 9 forks source link

Data divergence for opam.ocaml.org index.tar.gz #36

Closed hannesm closed 1 year ago

hannesm commented 1 year ago

Dear Madam or Sir,

I'm hapoy that after years of a single service running https://opam.ocaml.org, now it is deployed to multiple services (using DNS for "load balancing).

Unfortunately, I've to admit that the provided index.tar.gz are diverging -- the sha256 of c9e5011660bc1b77cc314d97dbca1eed39aa8252a2aaea2cb46931db5905dace is what I get from 51.158.232.133; the sha256 bd8a07db6c14abfb049caef6b120a92da30ff7c6ad031b53a5198c91b8f9d77d is what I receive from 151.115.76.159

Would it be feasible to get a synchronized deployment (and monitoring thereof)? The index.tar.gz from 151.115.76.159 looks very out of date :( According to the contained repo file, it is 4ac28b10 -- which refers to an opam-repository commit from 3 days ago.

//cc @mtelvers whom I believe is taking care of this as well (please correct me if I'm wrong) /cc @avsm @dra27

hannesm commented 1 year ago

From the discuss post https://discuss.ocaml.org/t/migration-opam-ocaml-org-moving-providers-this-week/11606 I followed the link to https://deploy.ci.ocaml.org/?repo=ocaml-opam/opam2web& and there's a red box "pull":

(from https://deploy.ci.ocaml.org/job/2023-03-24/002845-docker-pull-82f522)

2023-03-24 00:28.45: New job: "docker" "--context" "opam-5.ocaml.org" 
                     "pull"
                     "ocurrent/opam.ocaml.org:live@sha256:9db3f5e5427a77fd3a3e4d4f8fb7d4d1172ed0f36db1084428bb0e174699c444"
2023-03-24 00:28.45: Exec: "docker" "--context" "opam-5.ocaml.org" "pull" 
                           "ocurrent/opam.ocaml.org:live@sha256:9db3f5e5427a77fd3a3e4d4f8fb7d4d1172ed0f36db1084428bb0e174699c444"
error during connect: Post "http://docker.example.com/v1.24/images/create?fromImage=ocurrent%2Fopam.ocaml.org&tag=sha256%3A9db3f5e5427a77fd3a3e4d4f8fb7d4d1172ed0f36db1084428bb0e174699c444": command [ssh -l root -- opam-5.ocaml.org docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=root@opam-5.ocaml.org: Permission denied (publickey).

2023-03-24 00:28.47: Job failed: Command "docker" "--context" "opam-5.ocaml.org" "pull" "ocurrent/opam.ocaml.org:live@sha256:9db3f5e5427a77fd3a3e4d4f8fb7d4d1172ed0f36db1084428bb0e174699c444" exited with status 1

Looks like a permission issue.

mtelvers commented 1 year ago

@hannesm opam-5.ocaml.org is updating now. I am investigating why this happened.

hannesm commented 1 year ago

@mtelvers thanks a lot for quickly fixing this issue.

avsm commented 1 year ago

Thanks @hannesm! The divergence is very bad, indeed. Does the correct generated hash of the index.tar.gz match your own Mirage-based opam archive server, or is that expected to be different?

mtelvers commented 1 year ago

The root cause was that Scaleway recreated .ssh/authorized_keys when the machine was booted. Persistent keys must be added via the Scaleway dashboard or added to .ssh/instance_keys. I have updated the deployment script to reflect this. Furthermore, I have changed the routing of failure messages on Slack to improve the visibility of build/deployment failures.

hannesm commented 1 year ago

@avsm thanks for raising the question, the short answer is no - the hashes do not match.

The longer story is first: is a reproducible tarball a win? I think it is debatable, but it won't hurt if it's not much effort.

Starting with the contents differs. Opam uses a list of items, namely version packages and repo to include, whereas opam-mirror just includes everything (i.e. .gitattributes, README, ...). This is easy to adjust. The contents of the "stamp" field in the repo file could be unified to include the full git hash (I suspect that https://github.com/ocaml/opam/pull/5342 will change that behaviour (currently, it's only the first 8 characters).

The next step is "reproducible tarball generation" (gladly the reproducible builds project wrote an article about that: https://reproducible-builds.org/docs/archives/).

First of all, the question is which tar file format should be used. I suspect that ustar is sufficient (and less complex), with file names up to 256 characters (not sure whether there is a name length of opam package names, if not there should be).

Then there are three things:

This could be integrated into opam. The second point, to filter files only and not add directory entries to the tarball will make the tarball a bit more lighweight. On the https://opam.robur.coop side we will work on providing reproducible tarballs - and if the opam.ocaml.org team finds this important, we\re happy to discuss steps towards that goal.