tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.48k stars 1.78k forks source link

Improve ImageID #6970

Open lcarva opened 1 year ago

lcarva commented 1 year ago

Feature request

A ContainerStatus struct contains the attributes Image and ImageID. Their description is somewhat vague:

// Image is the name of container image that the container is running.
// The container image may not match the image used in the PodSpec,
// as it may have been resolved by the runtime.
// More info: https://kubernetes.io/docs/concepts/containers/images.
Image [string](https://pkg.go.dev/builtin#string) `json:"image" protobuf:"bytes,6,opt,name=image"`
// ImageID is the image ID of the container's image. The image ID may not
// match the image ID of the image used in the PodSpec, as it may have been
// resolved by the runtime.
ImageID [string](https://pkg.go.dev/builtin#string) `json:"imageID" protobuf:"bytes,7,opt,name=imageID"`

Tekton reports the value of ImageID in StepState and SideCar.

In some cases the registry/repo reported in ContainerStatus.ImageID is different than the registry/repo reported in ContainerStatus.Image. OCPBUGS-8428 contains a reproducer. (It's an OpenShift issue, but the underlying behavior comes from CRI-O). TL;DR: If the image was previously pulled on the node by a different image reference, the registry/repo of that image reference may get reported in ImageID.

The value of ContainerStatus.Image contains the right registry/repo, but may contain a tag instead of a digest. The value of ContainerStatus.ImageID contains the right digest, but may contains the wrong registry/repo.

I propose Tekton computes the value of StepState.ImageID and SideCar.ImageID by leveraging the information on both ContainerStatus.ImageID and ContainerStatus.Image.

Use case

I want to write a policy that evaluates the SLSA Provenance generated by Tekton Chains to allow/disallow usage of certain registries in Task Steps. Today, this policy is nondeterministic as the registry/repo value is affected by the runtime state of the node.

vdemeester commented 1 year ago

cc @wlynch @imjasonh

lcarva commented 1 year ago

NOTE: A recent change in cri-o, https://github.com/cri-o/cri-o/pull/7149, changes the ImageID to not include an image reference at all (no registry, repo, nor digest). In some cases, we may not be able to tell the digest of the image that was used.

One approach I can think of is:

  1. First look at the value of ContainerStatus.Image. If it contains a digest, use this value. Done.
  2. Second look at ContainerStatus.ImageID. If it contains an image ref and a digest, take the digest and combine it with the image ref from the previous step. Done.
  3. Finally, we can't tell the digest. Use the value of ContainerStatus.Image as is. Done.

Another is to just use the value of ContainerStatus.Image as is now.

Eventually, when everyone is on the newer version of cri-o, both approaches are effectively the same. I do think the first approach is probably the safest one for now. We leave it up to cri-o to fix the broken use cases.

The next question is whether we should embed this logic in Tekton, or if Tekton should just provide both Image and ImageID and let consumers decide how to handle them.

The benefit of doing this in Tekton Pipelines is that it, obviously, makes it easier for consumers. For example, Tekton Chains doesn't have to fiddle with the values.

Any preferences?

mtrmac commented 1 year ago

(A random drive-by…)

I want to write a policy that evaluates the SLSA Provenance generated by Tekton Chains to allow/disallow usage of certain registries in Task Steps

Inspecting an already-pulled image after the fact seems to me to be the wrong place in principle. When talking about ContainerStatus, the container has already been created! Wouldn’t it be much better not to issue the request with the unwanted image in the first place?

lcarva commented 1 year ago

Inspecting an already-pulled image after the fact seems to me to be the wrong place in principle. When talking about ContainerStatus, the container has already been created! Wouldn’t it be much better not to issue the request with the unwanted image in the first place?

It depends on the use case :smile: Here, I want to verify how the image was built, its provenance. I want to verify that any images used in building the image come from a certain repository, for example.

mtrmac commented 1 year ago

The value of ContainerStatus.ImageID contains the right digest, but may contains the wrong registry/repo.

No; ImageID contains some digest for that image, but not necessarily a digest that exists in the repo of ContainerStatus.Image. A variant of the https://issues.redhat.com/browse/OCPBUGS-8428 reproducer which uses the same image with a different digest (e.g. use skopeo copy --format or skopeo copy --dest-compress-format to trigger an image modification) would show that not only the repo, but also the digest can differ from the one requested.

(And, due to a bug/missing feature in the stack, the repo@digest combination contained in ContainerStatus might be completely incorrect, combining a digest from one repo with a repo name of another repo, pointing at an image that just doesn’t exist.)

lcarva commented 1 year ago

I wasn't aware of that. Thanks for pointing that out!

It looks like we should avoid step 2 from my suggestion.

lcarva commented 1 year ago

AFAICT, there is just no way for Tekton to do report the actual used image consistently on its own. This would require invasive(?) changes to container runtimes, e.g. cri-o, and likely changes in k8s as well.

If we still want some support here, I think the change would be for Tekton to expose both Image and ImageID and let clients decide what to do with it. For Chains, both would also be added to the SLSA Provenance. If there's agreement, I can revive https://github.com/tektoncd/pipeline/pull/7049.

Alternatively, we can leave things as they are, and require SLSA Provenance verifiers to fetch and evaluate the Pipeline/Task definition directly. It's not ideal, but there are other use cases that may require this regardless.

mtrmac commented 1 year ago

(To leave breadcrumbs…)

AFAICT, there is just no way for Tekton to do report the actual used image consistently on its own. This would require invasive(?) changes to container runtimes, e.g. cri-o, and likely changes in k8s as well.

The problematic case is creating a container from an image that already exists on the system. In that case Kubelet does ImageStatus(user-specified-image-name) = deduplicated-image-ID, and CreateContainer(deduplicated-image-ID).

A recent Kubelet change has added the user-specified-image-name value to the CreateContainer call, but that’s not sufficient to resolve to a digest if user-specified-image-name contains a tag. Then we know which deduplicated image it is, we know which repo, but there may be more than one matching digest for the same deduplicated image, even within that one repo.

Now, of course, for some use cases that doesn’t matter at all; the image config, and the layers, are the same for all deduplicated images. OTOH every single image digest carries its own set of signatures, and possibly other provenance information.