Closed Blaisorblade closed 3 years ago
So, what are you actually trying to get done? I see a series of low-level requests (this and #4255) but I don't have a sense of what you're trying to accomplish at a high level. Do you want some way to generate a Dockerfile for a sequence of installs and opam remotes that is caching friendly for BuildKit layers?
It would be helpful to see that description, and perhaps a few example Dockerfiles that you're currently using and that are problematic. https://github.com/ocurrent/ocaml-ci/tree/master/solver has a comprehensive domain-specific opam solver that we use to generate caching-friendly Dockerfiles for the new ocaml CI, so we may be able to pull something out of that for use by Coq as well.
At a high level, I am trying to minimize the image size, while it seems you are trying to minimize image build time; in fact, I'm increasing build times to reduce the image size.
The basic trick I learned is each RUN
command should end by deleting as much as possible; for instance, each RUN
command that calls apt-get install
should end in apt-get clean
to remove files that are not needed any more. In the same way, I want to remove all opam caches — that is, everything that is not needed to run the installed programs, but that can be regenerated cheaply or redownloaded.
ocurrent/ocaml-ci@
master
/solver has a comprehensive domain-specific opam solver that we use to generate caching-friendly Dockerfiles for the new ocaml CI, so we may be able to pull something out of that for use by Coq as well.
Looking at https://github.com/ocurrent/ocaml-ci suggests it's cool and it solves a real problem, but that it's orthogonal to what I'm doing; and looking at https://github.com/ocurrent/ocaml-ci/blob/master/Dockerfile suggests the produced images might be bigger than needed, since that calls neither apt-get clean
nor opam clean
— I guess you don't need to optimize for image size, but I do. And sure, that image is not the point of the tool; I only used it as a rough proxy.
I don't have a sense of what you're trying to accomplish at a high level. Do you want some way to generate a Dockerfile for a sequence of installs and opam remotes that is caching friendly for BuildKit layers?
I think the included opam update <needed repos>; opam install foo; /tmp/opam-clean
is all the relevant context. Maybe you're asking whether I install packages in one go or in multiple ones, but that's orthogonal to the image size.
More in general, I just want to write by hand Dockerfiles that produce images that don't waste space — I expect those images to be updated at most once a month. I've never heard of BuildKit or Moby before this week, I still use Docker like I learned in 2014. I do try to shrink each layer.
The usual advice says things like "after apt-get install
run apt-get clean
, and in the same step, else Docker will save the apt caches in the produced layer". For Opam, I'd like a corresponding command, and it's totally fine if it's not opam clean
but it requires adding all the options in the man page — so, opam clean -a -c -s --logs -r
.
And yet, even after calling that, adding a 5MB executable and 25MB of extra files (my opam repository) added 200 MB to the image, because opam was saving a checkout of coq (git+https://github.com/aa755/coq.git#coqdep
) and (I guess) attending git objects.
At a high level, I am trying to minimize the image size,
Keep going higher :-) What are you trying to do with the opam client and opam repos and Dockerfiles? Is it a Coq memoisation cache of some kind, or a CI, or something else? If you could sketch out the overall flow of images, that would be helpful. We'd rather understand the problem you're trying to solve rather than shoehorn in partial solutions to the opam client.
My initial thought is that you want some way to install opam packages to look like system packages, and then get rid of opam entirely from the image. That would lose the entire ~/.opam directory, and leave the image with something that looks similar to an installation done from (e.g.) a Debian package.
Note that the Dockerfiles you looked at above aren't what ocaml-ci generates; they are the Dockerfiles for ocaml-ci itself. It generates much lower level files that look like: https://ci.ocamllabs.io:8100/job/2020-07-23/092829-ci-ocluster-build-ec217b -- each layer is designed to minimise the dependencies on the source tree (so it does an incremental build in seconds if you only change source code, and not package metadata).
@avsm The image is to be used for the CI of other Coq packages — it memoizes the "static" dependencies (the packages that don't change much, both Coq itself and coq libraries). If the image is bigger, it takes longer to download, and I was told that matters (up to a point).
Note that the Dockerfiles you looked at above aren't what ocaml-ci generates; they are the Dockerfiles for ocaml-ci itself.
Yeah I was aware; thanks for the link, but it seems yours doesn't call opam clean
either.
My initial thought is that you want some way to install opam packages to look like system packages, and then get rid of opam entirely from the image. That would lose the entire ~/.opam directory, and leave the image with something that looks similar to an installation done from (e.g.) a Debian package.
That seems interesting, but our CI uses opam and assumes an opam install, and after using the opam-clean
above, opam just takes a few extra megabytes, so it doesn't seem worth it. To wit (with sizes in KB), on the produced image:
coq@f2db5975737c:~$ 'du' -sk .opam/4.07.1+flambda/
1442100 .opam/4.07.1+flambda/
coq@f2db5975737c:~$ 'du' -sk .opam/
1442208 .opam/
coq@f2db5975737c:~$ 'du' -sk $(which opam)
5980 /usr/local/bin/opam
Thanks @Blaisorblade for the ping!
Currently, the main cleanup command we rely on for coqorg/coq
Dockerfiles is: opam clean -a -c -s --logs
but I would indeed be interested in having more standard ways to further reduce the size of the .opam
folder in these images :)
Reviewing issues yesterday, it's not clear that there are features here for opam to add - looking at the three original items in your opam-clean
script:
rm -rf ~/.opam/download-cache
- everything should have been removed by opam clean -c
, so it's a bug if there's anything in .opam/download-cache
to remove.rm -rf ~/.opam/repo/*/
- opam 2.1 improves this slightly inasmuch as http remotes are now stored compressed. However, this rm -rf
is already potentially brittle - if opam
for any reason invalidates its cache, then the cache will be reinitialised without warning with no packages which is at least confusing to any user of the image.rm -rf ${opamPrefix}/.opam-switch/sources
- opam clean -s
should be removing everything which can safely be removed.Additionally, we're working on proposals for repo/
and possibly also .opam-switch/sources
which would render this kind of cleaning impossible.
In terms of feature changes to opam, I'm going to close this, but please do re-open it if necessary.
Additionally, we're working on proposals for
repo/
and possibly also.opam-switch/sources
which would render this kind of cleaning impossible.
I'd hope that means that more of this data can be cleaned safely, instead of less?
No, that's not the target of that piece of work.
The issue from opam's perspective here is that you're not asking to "clean" the root (that would imply the files were cached), what you're doing is corrupting the root, it just so happens to be OK to do that for your use-case. For example, the Coq repos you're working with may not use any files, but this Dockerfile is not a case for allowing blessed deletion of repo clones:
FROM ocaml/opam:debian-ocaml-4.12
RUN rm -rf ~/.opam/repo/*/
RUN opam reinstall ocaml-config
(fails unless one first runs opam update
).
I filed #4255 before, but after more investigation, it turns out that
opam clean
leaves lots of useless caches around. I'm building Docker images, where reducing disk size is important. Here's what I use now — /cc @erikmd who might also be interested.In my new Dockerfiles, each separate
opam install
step becomesopam update <needed repos>; opam install foo; /tmp/opam-clean
, where/tmp/opam-clean
has the following contents:I realize removing the sources might in principle break remove scripts, but that folder contains sources for packages with empty remove scripts (at least
coq-iris
andcoq-stdpp
) — so there's no reason to keep them.Now, of course opam isn't entirely too happy about that —
opam install
, done as above, works just fine, but other things run into more trouble. Below is the log of some commands, showing a slightly confusedopam
— the firstopam upgrade
is expected to fail, but afteropam update
redownloads everything needed, the secondopam upgrade
gives the wrong result.