project-stacker / stacker

Build OCI images natively from a declarative format
https://stackerbuild.io
Apache License 2.0
187 stars 34 forks source link

stacker ideas from <nixpkgs>.dockerTools #105

Open CajuM opened 3 years ago

CajuM commented 3 years ago

References:

tych0 commented 3 years ago

Make stackerfiles pure, this is in order to ensure 1 to 1 mapping between source and image.

It's not really clear how useful this would be. e.g. you can't use yum any more, because the repos could point to other stuff, etc. You could use stacker to accomplish fully reproducible stuff if you want, but it's very painful to do, and not likely to be any kind of default any time soon.

Address images by the hash of their evaluated stackerfile, for caching purposes.

The caching code uses the hash of the input stacker file, but also the hashes of the import values. So I think this is covered.

Create a distributed build environment by generating a DAG from the evaluated stackerfile and triggering remote builds, by ex: ssh-ing into them and running stacker build stacker.nix -A myC3Image

This seems like something better suited for a higher level tool, vs stacker itself. Unless there's some nice way to embed k8s as a go library ;)

CajuM commented 3 years ago

Make stackerfiles pure, this is in order to ensure 1 to 1 mapping between source and image.

It's not really clear how useful this would be. e.g. you can't use yum any more, because the repos could point to other stuff, etc. You could use stacker to accomplish fully reproducible stuff if you want, but it's very painful to do, and not likely to be any kind of default any time soon.

Agree, if it were indeed a default it would require package manager support for repo metadata snapshots. Actually... you'd be fully incapable of using the network at all outiside stacker's downloader to enable reproducible builds. So as to not bypass the consitent input guarantees.

Address images by the hash of their evaluated stackerfile, for caching purposes.

The caching code uses the hash of the input stacker file, but also the hashes of the import values. So I think this is covered.

I was reffering here to when you derive from an existing image, you would use the hash of the sources as the tag, instead of the git commit, to decouple from the VCS.

Create a distributed build environment by generating a DAG from the evaluated stackerfile and triggering remote builds, by ex: ssh-ing into them and running stacker build stacker.nix -A myC3Image

This seems like something better suited for a higher level tool, vs stacker itself. Unless there's some nice way to embed k8s as a go library ;) Agreed

In the current workflow, as I recall, there is a tight coupling between VCS and stacker, it ensures that it's using the latest parent image by walking the git tree until it finds a modification.

I was thinking in the context of reproducible builds that you would be able to just ignore versioning, or any concept of newer/older layer. You would have a working directory with a stacker.nix, call stacker build . -A img1 -A img2 if any dependent image is missing from the zot repo, indexed by its tag, which maps 1 to 1 to its source, you build and upload it.

Currently what we do is start from the image we want to build and walk the git tree until we find the latest change in the source of its parent image. If we had a 1 to 1 mapping between image and source we'd know exactly which image we depend upon and if it's missing without a VCS.

Now that I think of it, it's not really necessary to have a 1 to 1 mapping between source and image hash, just its tag, in order to eliminate a VCS from the workflow. That's only necessary if you want reproducible builds, which as you said is of limited use and difficult to implement.

tych0 commented 3 years ago

In the current workflow, as I recall, there is a tight coupling between VCS and stacker, it ensures that it's using the latest parent image by walking the git tree until it finds a modification.

There's no required coupling with git, although it does add the current git hash to the generated metadata if you happen to be in a git repo. You can push images named for their git hash via stacker publish, but that's also for convention's sake: it's not clear how one would derive a hash of all the inputs without actually downloading them to the local disk, and the point of the caching convention is to avoid in part that local download.

Now that I think of it, it's not really necessary to have a 1 to 1 mapping between source and image hash, just its tag, in order to eliminate a VCS from the workflow.

I think it depends on what you're after. It's a nice idea, the question is just how to communicate these hashes to everyone. A convention via git hashes is probably the dumbest way, but perhaps there are others.

CajuM commented 3 years ago

the way nix does it is by having the developers embed the hashes into their source, for stacker you'd have to add the hash of each import as an extra filed in the stackerfile, this wouldn't work with images without source to binary mapping though, so images should only rely on tags. Also I was thinking of something like:

from: img_var # if we were to use a programing language we would be able to pass arbitrary images as parents in a reusable way
# if the image described by img_var has been cached by zot, download it. Its tag will be the same as its source. Otherwise, build it.
# and at the end, and this should be external to stacker, upload using skopeo