pulp / netboot-oci-specs

Specifications describing Netboot files stored as OCI artifacts
1 stars 1 forks source link

artifacts or just a container layer with special annotations? #6

Open cgwalters opened 4 weeks ago

cgwalters commented 4 weeks ago

At a quick skim: IMO this spec is overall sane, and I'd be fine to ship and support tooling using it.

I wasn't involved in its drafting, but just reading it I find myself wondering: Is it really worth defining a custom OCI artifact type for this versus a spec that is basically:

FROM quay.io/fedora/fedora:40 as builder
RUN curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/39/Everything/x86_64/os/images
FROM scratch
COPY --from=builder /images
LABEL org.pulpproject.netboot.entrypoint=shim

etc?

The thing is stuff like org.pulpproject.netboot.os.arch would require special handling by clients, but...when you start having architecture-specific binaries I think one needs to consider just reusing standard OCI containers, but with special labels.

Looking at it too, we have other special annotations like org.pulpproject.netboot.src.digest that wouldn't be necessary because OCI containers already have both compressed and uncompressed digests of the tarball (which includes file size metadata obviously etc.)

lzap commented 3 weeks ago

Hey Colin, thanks for looking into it.

There is no particular reason other than someone from the internal bootc team recommending OCI artifact tooling as a good fit for the job. Then when I wrote the downloading tool which acts like a "rsync" I realized I need uncompressed digest and I had no idea this metadata is actually available. Having this as part of OCI metadata (not blobs) is something handy - no need to download layers to figure out that no update is actually needed.

Another reason why we went for a custom tool was lack of a standard tool in the podman universe to download arbitrary files from OCI container images. I was only able to figure out pulling the image, starting a temporary container and copying files from there and finally shutting the container down. That felt clunky.

I am happy to rewrite the specs to use just podman unless @ipanova objects. In fact, let me explore this a bit.

cgwalters commented 3 weeks ago

Another reason why we went for a custom tool was lack of a standard tool in the podman universe to download arbitrary files from OCI container images. I was only able to figure out pulling the image, starting a temporary container and copying files from there and finally shutting the container down. That felt clunky.

There's skopeo copy docker://quay.io/example/image:latest oci:image-ocidir/ which will basically just download the raw JSON and tarballs, no container runtime required. However it won't process the tarballs for you.

There's also oc image extract which is pretty much this, and yes, it would make sense to have such a thing probably as part of skopeo too.

cgwalters commented 3 weeks ago

There is no particular reason other than someone from the internal bootc team recommending OCI artifact tooling as a good fit for the job.

Right, that may have been me (sorry!) - but looking at what resulted in the spec, ISTM that standardizing just metadata on top of an OCI image may make more sense.

BTW, one thing I'm not sure is standardized at all but maybe should be is the concept of something like a "single layer container" - I am not aware of a good use case for supporting multiple layers here, and doing so makes the unpacking logic more complex. If it was required that there was at most a single tar layer (with no whiteouts) that seems like a good idea.

ipanova commented 3 weeks ago

@cgwalters thanks so much for the feedback!

Is it really worth defining a custom OCI artifact type for this versus a spec

Up until recently people were sort of 'abusing' OCI container image specs (also because of registry limitations..) to upload and store artifacts in the registry in the arbitrary way. Non-OCI conformant artifacts were having applicationvnd.oci.image.config.v1+json value for config.mediaType. For this particular reason, to enable proper concept of RAS ( registry as a storage) were created OCI artifacts, where config.mediaType was meant to be set to some other mediaType because the original one was reserved for container runtime (aka 'runnable' images). With this reasoning, my preference would be to stick to the artifacts guidance specs.

BTW, one thing I'm not sure is standardized at all but maybe should be is the concept of something like a "single layer container"

Whether to keep multiple layers or single one, I would not have a preference. When working on the netboot specs proposal we were using ORAS tool that offers 2 ways of storing multiple artifacts a) separately as multiple layers b) directory tared in a single layer https://oras.land/docs/how_to_guides/pushing_and_pulling#pushing-artifacts-with-multiple-files Some layer annotations, such as title,digest, were created by the ORAS tool automatically during push operation so we kept those untouched. I would be ok to remove some of those too.

The thing is stuff like org.pulpproject.netboot.os.arch would require special handling by clients, but...when you start having architecture-specific binaries I think one needs to consider just reusing standard OCI containers, but with special labels.

Yes, I agree. We were sort of going back and forth between having special annotation for the arch or rather incorporate that info in the image tag e.g https://quay.io/repository/fedora/fedora?tab=tags

RE: podman/skopeo.

To my knowledge podman5 did recently add the ability to build and push OCI artifacts. For the delivery pipeline it probably would not make any difference whether to use ORAS or podman. On the consumption side, however, nor podman or skopeo can extract the artifact files in a similar fashion as ORAS can( hence I think those layers contain title and digest in the annotations) https://github.com/containers/podman/issues/21785, https://github.com/containers/podman/issues/21785#issuecomment-2079449282

lzap commented 3 weeks ago

So I created this and it looks pretty good:

FROM quay.io/fedora/fedora:40 as builder
RUN mkdir /b
WORKDIR /b

# Artifacts from kickstart repository.
RUN curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/images/pxeboot/vmlinuz
RUN curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/images/pxeboot/initrd.img
RUN curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/images/install.img

# Artifacts from RPM repository.
RUN dnf -y install shim-x64 grub2-efi-x64 syslinux-tftpboot
RUN cp /tftpboot/pxelinux.0 . && cp /boot/efi/EFI/fedora/{shim,grubx64}.efi .

# Creation of a digest file.
RUN sha256sum * | tee SHA256SUM

# Entrypoint symlinks MUST NOT be in the digest file.
RUN ln -s shim.efi BOOT && ln -s grubx64.efi BOOTA && ln -s pxelinux.0 BOOTL

FROM scratch
# The first layer MUST be the digest file.
COPY --from=builder --chmod=444 /b/SHA256SUM /
# Bigger payload SHOULD be in separate layers.
COPY --from=builder --chmod=444 /b/vmlinuz /b/initrd.img /
COPY --from=builder --chmod=444 /b/install.img /
COPY --from=builder --chmod=444 /b/pxelinux.0 /b/shim.efi /b/grubx64.efi /
# Entrypoint symlinks MUST be the last layer.
COPY --from=builder  --chmod=444 /b/BOOT /b/BOOTA /b/BOOTL /
LABEL org.pulpproject.netboot.version=1

It creates the following structure:

$ skopeo inspect containers-storage:localhost/nb:latest
{
    "Name": "localhost/nb",
    "Digest": "sha256:fda9b313c0d080174688b61e49e5c464bedb97e77b3ee3f04933c5dac24b48ff",
    "RepoTags": [],
    "Created": "2024-06-11T10:21:19.387875497Z",
    "DockerVersion": "",
    "Labels": {
        "io.buildah.version": "1.35.3",
        "org.pulpproject.netboot.version": "1"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:252452892075fb4a429540febce7760e9423e370e10e0a83f8686e06000d53ab",
        "sha256:d226b44a56cfcac43cc1055df975349b614b572b086b5aee1bcd80d555b9de23",
        "sha256:e28e9914e930334cee08f4e6a4d39cb4fdd9c144347d0cf02ae71e1bd067e51b",
        "sha256:bd59b78273b2d51837fc70617db76ace0fb56b3c0a4f9213bd34f4f3c596ccd7",
        "sha256:87635095bbc3f94cb54ab8be42f93fdc01abf8c8d7819a4efa2cf16535c403b6"
    ],
    "LayersData": [
        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar",
            "Digest": "sha256:252452892075fb4a429540febce7760e9423e370e10e0a83f8686e06000d53ab",
            "Size": 2048,
            "Annotations": null
        },
        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar",
            "Digest": "sha256:d226b44a56cfcac43cc1055df975349b614b572b086b5aee1bcd80d555b9de23",
            "Size": 164366848,
            "Annotations": null
        },
        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar",
            "Digest": "sha256:e28e9914e930334cee08f4e6a4d39cb4fdd9c144347d0cf02ae71e1bd067e51b",
            "Size": 618010112,
            "Annotations": null
        },
        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar",
            "Digest": "sha256:bd59b78273b2d51837fc70617db76ace0fb56b3c0a4f9213bd34f4f3c596ccd7",
            "Size": 4967936,
            "Annotations": null
        },
        {
            "MIMEType": "application/vnd.oci.image.layer.v1.tar",
            "Digest": "sha256:87635095bbc3f94cb54ab8be42f93fdc01abf8c8d7819a4efa2cf16535c403b6",
            "Size": 4967936,
            "Annotations": null
        }
    ],
    "Env": [
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ]
}

Unfortunately, there is no tool that would help me extracting the payload, podman does have an export feature but it only works on containers not images, skopeo can copy but there is no export/extract feature available. The only option is to run the container, but there is no executable to run, it makes no sense to put libc/bash or whatever just to run "sleep". I could write a simple utility until we get something in podman or skopeo, in the meantime:

$ skopeo copy containers-storage:localhost/nb:latest dir:/tmp/test

$ for F in /tmp/test/*; do tar tvf $F 2>/dev/null; done
-r--r--r-- 0/0             459 2024-06-11 12:21 SHA256SUM
-r--r--r-- 0/0          949424 2024-06-11 12:21 BOOT
-r--r--r-- 0/0         3972416 2024-06-11 12:21 BOOTA
-r--r--r-- 0/0           42529 2024-06-11 12:21 BOOTL
-r--r--r-- 0/0         3972416 2024-06-11 12:21 grubx64.efi
-r--r--r-- 0/0           42529 2024-06-11 12:21 pxelinux.0
-r--r--r-- 0/0          949424 2024-06-11 12:21 shim.efi
-r--r--r-- 0/0       149397724 2024-06-11 12:00 initrd.img
-r--r--r-- 0/0        14966600 2024-06-11 11:59 vmlinuz
-r--r--r-- 0/0       618008576 2024-06-11 12:21 install.img

It appears that symlinks are dereferenced, I wonder if it will work if I'd put them in the same layer as link targets.

There is the mount command from podman, but it is quite clunky. Different commands for root and rootless mode:

$ podman unshare podman image mount --format json localhost/nb:latest
[
 {
  "id": "c387bff016ec53e5ab0e1904370d8f259d311341a6cc2ea44b8357a7455a4452",
  "Names": [
   "sha256:fda9b313c0d080174688b61e49e5c464bedb97e77b3ee3f04933c5dac24b48ff"
  ],
  "Repositories": [
   "localhost/nb:latest"
  ],
  "mountpoint": "/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged"
 }
]

$ podman unshare find /home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/SHA256SUM
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/BOOTL
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/BOOT
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/BOOTA
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/shim.efi
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/grubx64.efi
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/pxelinux.0
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/install.img
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/initrd.img
/home/lzap/.local/share/containers/storage/overlay/44f81f95e966b6233f3d62897f1eee19e70c0e651a3f4dbee50ba923c9deb51d/merged/vmlinuz

Anyways, this looks good, I mean it does the job. What you think?

cgwalters commented 3 weeks ago

curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/images/pxeboot/vmlinuz

I'd recommend adding -R to invocations of curl to canonicalize timestamps. This aids in https://reproducible-builds.org/ (As is, your splitting each file into a separate layer as an optimization would be defeated by timestamps unless one passes e.g. podman build --timestamp to the overall thing)

RUN sha256sum * | tee SHA256SUM

Seems sane, that said it'd be good to verify the inputs against the treeinfo and perhaps the build process could do that and just reuse those sha256 checksums instead of recomputing them. (Not a big deal of course, just noting)

So I created this and it looks pretty good:

Agreed. So...I think next steps here would be to:

lzap commented 3 weeks ago

I'd recommend adding -R to invocations of curl to canonicalize timestamps.

Good call, will do. Podman somehow was able to cache the commits by itself which was surprising to me, this cannot hurt tho.

Seems sane, that said it'd be good to verify the inputs against the treeinfo and perhaps the build process could do that and just reuse those sha256 checksums instead of recomputing them.

Hmmm, recalculating is quick as podman will cache the downloaded files and actually skip most of steps. However, I think it makes sense to maybe download the treeinfo and put it as one of the artifacts for record purposes, good idea I think.

.I think next steps here would be to:

Right, any ideas about the client that would download and extract the files? Shall we file a RFE into skopeo or podman to bring a new "extract image" feature? Or just a shell/python script doing podman pull/mount is fine?

lzap commented 3 weeks ago

Added on @ipanova request:

$ skopeo inspect containers-storage:localhost/nb:latest --raw| jq
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:c387bff016ec53e5ab0e1904370d8f259d311341a6cc2ea44b8357a7455a4452",
    "size": 1752
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:252452892075fb4a429540febce7760e9423e370e10e0a83f8686e06000d53ab",
      "size": 2048
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:d226b44a56cfcac43cc1055df975349b614b572b086b5aee1bcd80d555b9de23",
      "size": 164366848
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:e28e9914e930334cee08f4e6a4d39cb4fdd9c144347d0cf02ae71e1bd067e51b",
      "size": 618010112
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:bd59b78273b2d51837fc70617db76ace0fb56b3c0a4f9213bd34f4f3c596ccd7",
      "size": 4967936
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:87635095bbc3f94cb54ab8be42f93fdc01abf8c8d7819a4efa2cf16535c403b6",
      "size": 4967936
    }
  ],
  "annotations": {
    "org.opencontainers.image.base.digest": "",
    "org.opencontainers.image.base.name": ""
  }
}

And:

$ skopeo inspect containers-storage:localhost/nb:latest --raw --config| jq
{
  "created": "2024-06-11T10:21:19.387875497Z",
  "architecture": "amd64",
  "os": "linux",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Labels": {
      "io.buildah.version": "1.35.3",
      "org.pulpproject.netboot.version": "1"
    }
  },
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:252452892075fb4a429540febce7760e9423e370e10e0a83f8686e06000d53ab",
      "sha256:d226b44a56cfcac43cc1055df975349b614b572b086b5aee1bcd80d555b9de23",
      "sha256:e28e9914e930334cee08f4e6a4d39cb4fdd9c144347d0cf02ae71e1bd067e51b",
      "sha256:bd59b78273b2d51837fc70617db76ace0fb56b3c0a4f9213bd34f4f3c596ccd7",
      "sha256:87635095bbc3f94cb54ab8be42f93fdc01abf8c8d7819a4efa2cf16535c403b6"
    ]
  },
  "history": [
    {
      "created": "2024-06-11T10:21:14.169165993Z",
      "created_by": "/bin/sh -c #(nop) COPY file:91cb33fa65b25db212d3e339c5b6961de5aeaab0ab13720fcd1b1cfb028b6b6c in / "
    },
    {
      "created": "2024-06-11T10:21:14.719484309Z",
      "created_by": "/bin/sh -c #(nop) COPY multi:7279b1e022505144da0d404c2b75b11ffa1272bc33b744d71b71e4028fb1464e in / ",
      "comment": "FROM f67002a08a15"
    },
    {
      "created": "2024-06-11T10:21:16.720244088Z",
      "created_by": "/bin/sh -c #(nop) COPY file:0b214e66e1b552bb92a6a38584c92e698a34adf4140bf53ee85d81ed90ff2153 in / ",
      "comment": "FROM bdd1afa2e05a"
    },
    {
      "created": "2024-06-11T10:21:19.017951986Z",
      "created_by": "/bin/sh -c #(nop) COPY multi:6bbd68bf7ddeb61f1c1860362699a8fdb82159b08069574267fc52bd37daddbd in / ",
      "comment": "FROM 314f40a81916"
    },
    {
      "created": "2024-06-11T10:21:19.28298871Z",
      "created_by": "/bin/sh -c #(nop) COPY multi:cbaec5a2fff0cb4d79304b085f97778ba4c33a40abbc4e7e6eb9ff7e8d333843 in / ",
      "comment": "FROM 5454376b0d75"
    },
    {
      "created": "2024-06-11T10:21:19.387979723Z",
      "created_by": "/bin/sh -c #(nop) LABEL org.pulpproject.netboot.version=1",
      "comment": "FROM d06432cb64b7",
      "empty_layer": true
    }
  ]
}
ipanova commented 3 weeks ago

@cgwalters So... is your preference to just go with standard container image manifest format and put there kickstart files? In that case we should change at least the config.mediaType (per specs). It is doable, just feels slightly wrong, especially if OCI artifacts were created exactly for such cases. Mind that podman can annotate just manifest but not layers. Also neither podman nor skopeo can extract the container image contents and I am not sure whether and if this stands on their roadmap somewhere. Another reason to leverage oras tool for OCI artifacts. Will you be more open to leverage OCI artifacts if we put all files as one layer?

ipanova commented 3 weeks ago

@cgwalters Don't get me wrong, I want to get this done and shipped, so if the majority will decide that using container image manifest format is the easiest thing to do, I will not stand much in the way and yield. Just please consider my arguments towards OCI artifacts usage.

cgwalters commented 3 weeks ago

On Wed, Jun 12, 2024, at 12:11 PM, Ina Panova wrote:

@cgwalters https://github.com/cgwalters Don't get me wrong, I want to get this done and shipped, so if the majority will decide that using container image manifest format is the easiest thing to do, I will not stand much in the way and yield. Just please consider my arguments towards OCI artifacts usage.

To be clear, I am broadly OK with the spec as is, and no one needs my specific approval to move forward with this. We are just having a discussion 😀

My core feeling is OCI artifacts make sense when:

This use case matches just one of those two.

Note that the argument about extraction applies either way; with a custom OCI artifact type you need a custom build and a custom extractor right? Custom extractor especially for multi arch handling.

BTW one thing we’ve done in the past with somewhat similar cases (embedding RPMs in a container) is include a simple web server as the entry point. That adds another avenue for extraction or even direct serving, and addresses the “you can run it” problem.

But again…while I personally lean just making it a container, if you both feel otherwise I think that’s reasonable and we can move forward with the spec mostly as is.

lzap commented 3 weeks ago

So I was able to finalize my POC, see the gist for both container files for aa64 and x64 and let me know.

Overall, I like the idea of using podman for build process, leveraging its multi-arch capabilities together with using dnf directly makes it much easier than walking down the kickstart tree figuring out latest versions of shim/grub RPM packages. On the other hand, client tooling would need to be written from scratch, but this time I would probably do a shell/python script instead using podman mount command.

Questions or observationis:

If you ask me, I lean towards reworking the spec from the ground up. I know we spent quite some time figuring it out and I wish we could do this earlier but we all know it was Summit blocking us from meeting up and discussing this properly. I don’t mind rewriting it and prototyping a new client based entirely on podman command.

lzap commented 2 weeks ago

So we agreed with @ipanova to pursue the Containerfile solution with normal image layer and writing a custom Go tool to do extraction of files directly from a repo or a local storage, whatever will be easier to implement.

But before I start, I just wanted to get @cgwalters opinion on podman v5 artifacts which can be created via podman manifest add --artifact. We could grab the files the same way (via a Containerfile) but copy them outside of container into a volume and then put them one by one into a manifest via podman v5+. Then we would have again a cli tool to download and extract them. This option is even more future proof and it does not "abuse" image layers at all. What do you think about this solution? It is sort of similar to what we have built but without oras/cosign libraries so we would not have the burden of maintaining another stack of libraries as podman team would take care of all of it (except the download cli tool which should be just "100 lines" according to Miloslav).