Simplify usage of image.ymls vs parse-pkgs generation

deitch commented 5 years ago

Consolidating conversation from here:

@rvs said:

what to do with image.ymls using a floating tag (like snapshot or latest) will be fine, but I don't remember linuxkit and Docker caching policies for these. Do you?

The question of "what do with image ymls" means, how do we simplify the usage of the ymls. Right now, the image ymls are templated. For example, rootfs.yml comes from rootfs.yml.in, which looks as follows (partial):

kernel:
  image: KERNEL_TAG
  cmdline: "rootdelay=3"
init:
  - GRUB_TAG
  - DTREES_TAG
  - FW_TAG
  - XEN_TAG
  - GPTTOOLS_TAG
  - DOM0ZTOOLS_TAG
  - linuxkit/init:v0.5
  - linuxkit/runc:v0.5
  - linuxkit/containerd:v0.5
  - linuxkit/getty:v0.5

The *_TAG are replaced via parse-pkgs.sh to actual specific packages, e.g. zededa/kernel:abcdef55667aa.

This creates some challenges, as documented in BUIILD.md.

Notably, it leaves the actual build source yml outside of version control.

The purpose of this issue is to discuss what the flow should be, in terms of both people and automated systems, when building EVE images. The technology implementation should follow that desired flow.

deitch commented 5 years ago

using a floating tag (like snapshot or latest) will be fine

I would prefer not to. If we use a floating tag, we have two issues:

No guarantee that we are building what we think we are. Does latest refer to the tag pushed out yesterday at 10:00 UTC? At 09:52? At 22:07? Even at build time, we don't actually know what we are building.
No way to reproduce builds reliably.

The only way to get the above is to use actual hashes for builds and storage.

In production systems, I have stopped using even simple tags like foo:v0.5 and started using either git tags like foo:abcd5677abb or, even when I can, foo:v0.5@sha256:aabb66772fc, so I know the human representative name (v0.5) and am guaranteed the content (sha256:....).

deitch commented 5 years ago

Build Use Cases

Let's start by laying out the use cases. There are two sets of places where we modify tags on the fly:

linuxkit build configs: images/*.yml. These consume packages from pkg/*.
package dockerfiles: pkg/*/Dockerfile. These consume known, standard packages, but also may include other packages, either from pkg/* or from the hub, e.g. zededa/ztools.

When do I need the config files? When a build is being performed, one of:

a live or installer image that depends on packages (images/*.yml)
a package that depends on another package (pkg/*/Dockerfile)

What are all of the circumstances when a build is being performed? I will try to capture some, go ahead and edit my list.

Development latest: I am building (and re-building and re-building), and want to use the latest version of every input.
Development semi-latest: I am building (and re-building and...), and want to use the latest version of one or a few inputs. E.g. I am working on pkg/grub, but everything else should stay on a known good version.
Confirmed: I am building based on a known version of each input that is "approved" (i.e. passed CI or some human release process)
Automated: A system (CI or otherwise) is building based on the latest approved version of each input. This is akin to building an app using the latest released version of each module. It isn't "the latest in my tree", but "the latest that has been released".

Any others?

Config Trees

This is an issue specifically with Linuxkit Build Configs (images/*.yml). We have two additional issues:

Board-specific requirements: We may have a set of standard packages to include, and then one specific one to add for a specific board. As it stands now, we would need an entirely separate rootfs.yml and rootfs-board2.yml, where rootfs-board2.yml is entirely duplicative of rootfs.yml except for one package that has been added. This will get very difficult to maintain.
Optional packages: Adding optional packages also is impossible to do without a completely separate and duplicative config file.

This leans towards some form of "config aggregation", where we can have multiple configs that layer on top of each other. This will have to be done either in an upstream tool (like linuxkit) or in a pre-processor, like parse-pkgs.sh but much more advanced.

Detecting Changed Dependencies

The build structure based on parse-pkgs.sh replaces flags like STRONGSWAN_TAG in a template file with the actual tag, then generates the config (rootfs.yml) from the template (rootfs.yml.in).

With this structure, if someone manually changes a tag, the downstream tools (linuxkit) cannot detect that something has changed, unless one runs the entire toolchain and re-generates the config file. This is not terrible - we simply can declare, "you must re-run pre-processing" - but is less than ideal.

deitch commented 5 years ago

Once we have the use cases out, I will suggest some flows and tooling around it.

rvs commented 5 years ago

First of all @deitch I'd like to suggest you incorporate what @gianlzed has wrote on #3 and my scribbles on #28

In addition to that, I'd like to point out that this definitely applies to not only image.ymls but also between individual Dockerfiles -- since we're collecting requirements lets add that one as well (although it is entirely possible to look at it and say -- anything non-trivial should be really done as Alpine packages, not linuxkit packages).

At any rate, I wanted to comment on your use case #1 (Development latest). It is not that I need the latest when I'm developing in this mode it is that I need everything to be built from my repo. IOW, I want all the packages that participate in the creation of the final image to be take exactly from my repo of zenbuild. Not sure if it fits into your description of "Development latest".

deitch commented 5 years ago

Updated the above to better reflect comments on cross-package dependencies by @rvs here

deitch commented 5 years ago

Added Gianluca's use cases.

deitch commented 5 years ago

Added Roman's comments on detecting dependencies.

deitch commented 5 years ago

Once we are sure we have all of the use cases captured, we can start to work on simplifying.

deitch commented 5 years ago

@rvs wrote:

use case 1 ("Development latest"). It is not that I need the latest... it is that I need everything to be built from my repo.. I want all the packages that participate in the creation of the final image to be take exactly from my repo of zenbuild

In that case, though, I never am publishing, correct? I am just building from whatever is currently on my system? So the description is close, but not quite correct. It is that I am in "development" mode (working locally), but not "latest"; rather I am taking the current version of whatever is extant in my directory at this moment?

Comment: if that is correct, that doesn't have to be "development" mode. It could very well be a production build, where rather than being just anything, the system ensures pre-build that what is present in my directory at this moment is the correct version (latest blessed), either because I am working off of unchanged most-recent master, or because I am working off of a specific git tag.

project-eve / zenbuild