push new ostree-container from here to registry.ci.openshift.org/somewhere

cgwalters commented 3 years ago

https://github.com/openshift/os/pull/593 merged, but we will just have an .ociarchive in S3. Let's push the container somewhere.

We have two options:

1) Internal pipeline 2) A Prow periodic that builds git master and pushes the result

I think I lean towards 2.

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 2 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 2 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/os/issues/600#issuecomment-1001326431): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

travier commented 2 years ago

/reopen

openshift-ci[bot] commented 2 years ago

@travier: Reopened this issue.

In response to [this](https://github.com/openshift/os/issues/600#issuecomment-1004775192): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

cgwalters commented 2 years ago

/lifecycle frozen

There's actually a much more basic thing to do, which is change our CI to test the build as a container. I dug into this a bit, but it's tricky today because (to quote @petr-muller ) "The concern is that that tests publishing images does not fit into ci-operator model".

Right now we build the OCI image as part of a test flow (i.e. a regular pod) not an OCP build because the requirements on kvm (for https://github.com/cgwalters/kvm-device-plugin ) don't get injected into the OCP build pod.

We could certainly fix that, need to dig into whether build objects let us label the pod, and once we have that we just need ci-operator to configure the build object.

That said, I still think there's another problem here which is that OCP builds and ci-operator are not designed around generating new base images. We might be able to hack around that with a FROM scratch approach as part of a multi stage.

petr-muller commented 2 years ago

are not designed around generating new base images.

I'm curious about this bit because I do not understand it :) What is a "base image" in this context and why are OCP builds not designed for generating them? I'm trying to understand the problem domain.

cgwalters commented 2 years ago

A base image is e.g. registry.access.redhat.com/ubi8/ubi:latest or ubuntu:latest - an image without a parent.

But to restate, FROM scratch may suffice for this. I'd need to do some investigation.

cheesesashimi commented 2 years ago

Overview

Upon seeing this, I went down a bit of a rabbit-hole to understand the problem and build a mental model. This update is mostly a braindump of my current understanding of this issue, with the hope that I've managed to make it more digestible for others. As understand it, there are the three main problems:

ci-operator-prowgen does not set the kvm resources on image build pods in response to changes to the resources object in the ci-operator config. This means we cannot use an images build.
ci-operator tests are not (typically) used to produce and push an OCI container image. Using the images configuration is the preferred way to do this because it pushes the built images to an ImageStream for later consumption in both the CI workflow (by tests) and as part of the promotion workflows that ci-operator offers.
Converting an OCI archive obtained within a container build context (e.g., within $ podman build or $ buildah build) into a stage of the build is not possible (or I just don't know how to do it).

Details

`ci-operator-prowgen`

There's actually a much more basic thing to do, which is change our CI to test the build as a container. I dug into this a bit, but it's tricky today because (to quote @petr-muller ) "The concern is that that tests publishing images does not fit into ci-operator model".

I interpret this to mean the following: If your build produces an OCI container image, it should (ideally) come from the images portion of the ci-operator config. The built image gets pushed to an ImageStream and takes advantage of the promotion workflows in ci-operator, as well as being available for tests within the ci-operator config. By comparison, the tests are used to execute an arbitrary command inside a given image. While tests can produce artifacts, they are not intended to produce an OCI container image, though that does not mean they cannot.

Right now we build the OCI image as part of a test flow (i.e. a regular pod) not an OCP build because the requirements on kvm (for https://github.com/cgwalters/kvm-device-plugin ) don't get injected into the OCP build pod.

This seems to be a limitation of ci-operator-prowgen. If one modifies the resources portion of the ci-operator config:

resources:
  '*':
    limits:
      devices.kubevirt.io/kvm: "1"
    requests:
      cpu: 1000m
      devices.kubevirt.io/kvm: "1"
      memory: 1Gi
  build-test-qemu:
    requests:
      cpu: 1000m
      devices.kubevirt.io/kvm: "1"
      memory: 3Gi

And runs $ make jobs, ci-operator-prowgen will apply those to each of the tests defined, but it will not apply those to the image builds:

# Irrelevant fields omitted for brevity
presubmits:
  openshift/os:
  - agent: kubernetes
    cluster: build01
    context: ci/prow/images
    decorate: true
    name: pull-ci-openshift-os-master-images
    rerun_command: /test images
    spec:
      containers:
      # Notice that the resources global config does not get copied
      - resources:
          requests:
            cpu: 10m

Furthermore, explicitly trying to target the image builds by generated step name (pull-ci-openshift-os-master-images) or image reference (build-test-qemu-img) in the config produces the same Prow config. We could modify the Prow config by hand, but that takes us off the paved road, which will cause us more pain down the line.

`FROM scratch`

But to restate, FROM scratch may suffice for this. I'd need to do some investigation.

If I understand this correctly, what @cgwalters is suggesting is to create a scratch image target in the images configuration, and then do something like $ skopeo copy .oci-archive docker://<scratch image ImageStream target> within the tests phase to overwrite the scratch image. The promotion workflow takes place as a post-submit so it should, in theory, be able to handle that.

`kvm` injected into image build pod

Assuming we either figured out how to inject the kvm requirements into the image build pod or decide they're no longer necessary, there is another potential blocker: We have an .oci-archive that was generated from within a container build context. While podman and buildah both support being able to use an OCI archive as a FROM target, to my knowledge, the OCI archive needs to exist on the host disk; it cannot be consumed from inside the build context. To illustrate:

# Pull a Fedora image and store it as an OCI archive in our CWD
$ skopeo copy docker://registry.fedoraproject.org/fedora oci-archive:fedora.oci
Getting image source signatures
Copying blob c6183d119aa8 done
Copying config 9a1f7a0516 done
Writing manifest to image destination
Storing signatures 

# Build a simple container using the pre-fetched oci-archive
$ podman build -t test --file=<(echo """
FROM oci-archive:fedora.oci
RUN cat /etc/os-release""") .
STEP 1/2: FROM oci-archive:fedora.oci
Getting image source signatures
Copying blob c6183d119aa8 skipped: already exists
Copying config 9a1f7a0516 done
Writing manifest to image destination
Storing signatures
STEP 2/2: RUN cat /etc/os-release
# <snip for brevity>
COMMIT test
--> 976af5c4b1f
Successfully tagged localhost/test:latest
976af5c4b1f70e555ba1e6bed41131b0ec5090a1383e6d69e10972ba062df5d8

# However, if you inject the pre-fetched OCI archive into the container and try to use it as a FROM source, it will fail:
$ podman build -t test --file=<(echo """
FROM oci-archive:fedora.oci
COPY fedora.oci fedora-inside.oci
RUN stat fedora-inside.oci

FROM oci-archive:fedora-inside.oci
RUN cat /etc/os-release
""") .
[1/2] STEP 1/3: FROM oci-archive:fedora.oci
Getting image source signatures
Copying blob c6183d119aa8 skipped: already exists
Copying config 9a1f7a0516 done
Writing manifest to image destination
Storing signatures
[1/2] STEP 2/3: COPY fedora.oci fedora-inside.oci
--> Using cache 4e65927323d21f74dfd343fda09b09410789f3a1b29399a583a8c3d3f1e45b5f
--> 4e65927323d
[1/2] STEP 3/3: RUN stat fedora-inside.oci
--> Using cache 7004d594533cad7806a94ad75c43d5577c1beaa3d0f153fb3748762e8846d2cb
--> 7004d594533
[2/2] STEP 1/2: FROM oci-archive:fedora-inside.oci
Error: error creating build container: creating temp directory: open /home/zack/fedora-inside.oci: no such file or directory

There may be some advanced Containerfile directives I'm unaware of (e.g., RUN --mount=), so I cannot completely rule this method out.

Conclusion

Creating a FROM scratch target and pushing to that target from the tests portion may be our best path forward, even if it is a bit hacky. I also completely recognize that I may misunderstand or misinterpret parts of this and what's possible.

cgwalters commented 2 years ago

If I understand this correctly, what @cgwalters is suggesting is to create a scratch image target in the images configuration, and then do something like $ skopeo copy .oci-archive docker:// within the tests phase to overwrite the scratch image.

Hmm, interesting idea. I am not totally certain that would work though, because I think the test pods won't have RBAC set up to push to the registry.

I was more thinking of something like:

FROM quay.io/coreos-assembler/coreos-assembler as builder
RUN cosa build
RUN ...something to extract the rootfs (unpack the blob/sha256)

FROM scratch
COPY --from=builder /srv/cosa/rootfs.tar`

But there are a few problems with this; the main one is that we lose all metadata we inject at the build time on the container image, such as the version. (We'd need to replicate all that in Dockerfile). We also lose control over the serialization of the tar format, which will matter for ostree today.

cgwalters commented 2 years ago

kvm injected into image build pod

This could help but still has the core problem that we lose control over how we generate the image.

I think what we probably need to aim towards is more like a custom builder: https://docs.openshift.com/container-platform/4.9/cicd/builds/custom-builds-buildah.html

If we do that, we should be able to ensure our build pod has the required kvm annotations, and we own the task of pushing the image to the registry.

cheesesashimi commented 2 years ago

I took a look at raw_steps and this seems like the most promising way forward, insofar as being able to have more exact control over our build and test process. However, there appear to be a few blockers that I'm immediately aware of and don't (presently) have a way to work around:

Build Strategies

ci-operator exclusively uses the OpenShift Docker build strategy for image builds. It does not have support for the OpenShift custom build strategy This means:

The Dockerfile is mutated before it gets built (source). This means certain advanced directives that are available via Buildah are not possible because it mutates the inputs, especially around the FROM directive.
Image push secrets are not automatically available inside the build context; they must be explicitly injected as an input. However, the project_directory_image_build_step (which wraps the Docker Build Strategy) only supports inputs from other containers, so this is moot.

Potential Workarounds

In the example below, I use skopeo copy to download a container image and store it as an OCI archive on the local filesystem. This is intended to be a surrogate for the cosa build process which creates two OCI container image archives as build artifacts. As mentioned earlier, FROM oci-archive:image.oci is a valid image transport option for Buildah. However, the OCI archive must be in the local directory; it cannot be passed from one container build stage to another. To illustrate:

FROM quay.io/skopeo/stable:latest
WORKDIR /work
RUN skopeo copy docker://registry.fedoraproject.org/fedora:latest oci-archive:/work/fedora.oci

# This does not work
FROM oci-archive:/work/fedora.oci
RUN cat /etc/os-release

Trying this inside an OpenShift Docker Build strategy seems like it would work:

---
kind: ImageStream
apiVersion: v1
metadata:
  name: built-images
---
kind: BuildConfig
apiVersion: build.openshift.io/v1
metadata:
  name: skopeo-pull
spec:
  output:
    to:
      kind: ImageStreamTag
      name: built-images:skopeo-pull
  source:
    dockerfile: |
      FROM quay.io/skopeo/stable:latest
      WORKDIR /src
      RUN skopeo copy docker://registry.fedoraproject.org/fedora:latest oci-archive:/src/fedora.oci
  strategy:
    dockerStrategy: {}
---
kind: BuildConfig
apiVersion: build.openshift.io/v1
metadata:
  name: skopeo-based
spec:
  output:
    to:
      kind: ImageStreamTag
      name: built-images:skopeo-based
  source:
    images:
      - from:
          kind: ImageStreamTag
          name: built-images:skopeo-pull
        paths:
          - destinationDir: image
            sourcePath: /src/.
    dockerfile: |
      FROM oci-archive:image/fedora.oci
      RUN cat /etc/os-release
  strategy:
    dockerStrategy: {}

However, either the OpenShift Builder or Buildah (not sure which) tries to append a / to the image directory when it starts the build process:

$ oc start-build skopeo-based --follow
build.build.openshift.io/skopeo-based-6 started
Caching blobs under "/var/cache/blobs".
Trying to pull image-registry.openshift-image-registry.svc:5000/default/test-boot-in-cluster-image@sha256:fdc9433f081601823dd959271da3152c48ddce617d98277c66838194ff5e597f...
Getting image source signatures
Copying blob sha256:d845d72ee02bb0ef685b18cfd81f710f504d59bcab16400bcda7dedbab74f8dc
Copying blob sha256:4b49ec78acac3b96cf4d327d9aa81cebbf84b7ed67090d8c3c9224546349b947
Copying blob sha256:c6183d119aa8953fe2cb6351e8fb4aeeb770f86c1aef3d534f7e02f5e2861321
Copying blob sha256:f96511a93512217c0cb253a7e0f4f049ba12a33f9b84364189d0b3d1eec55d3b
Copying blob sha256:415d5a2b6f94adc8715963ca420254ccb866192b25d52ab49640532dee48dafa
Copying blob sha256:cd62ba9735660431f88ce91edd75c1715ca188ad2a0b52998ad4d4fbaa130e5e
Copying blob sha256:415d5a2b6f94adc8715963ca420254ccb866192b25d52ab49640532dee48dafa
Copying blob sha256:9711468d0445dc7c47b8e57eb8680d19c46a46c8c4325e035a06b4fad73be5ae
Copying blob sha256:d845d72ee02bb0ef685b18cfd81f710f504d59bcab16400bcda7dedbab74f8dc
Copying blob sha256:f96511a93512217c0cb253a7e0f4f049ba12a33f9b84364189d0b3d1eec55d3b
Copying blob sha256:cd62ba9735660431f88ce91edd75c1715ca188ad2a0b52998ad4d4fbaa130e5e
Copying blob sha256:c6183d119aa8953fe2cb6351e8fb4aeeb770f86c1aef3d534f7e02f5e2861321
Copying blob sha256:9711468d0445dc7c47b8e57eb8680d19c46a46c8c4325e035a06b4fad73be5ae
Copying blob sha256:4b49ec78acac3b96cf4d327d9aa81cebbf84b7ed67090d8c3c9224546349b947
Copying config sha256:51d39a7d2e8e9be61f25d3719b6f0234614027424c407224986487b43988305b
Writing manifest to image destination
Storing signatures
time="2022-02-11T18:01:42Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0211 18:01:42.267830       1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/4: FROM oci-archive:image/fedora.oci
error: build error: error creating build container: lstat /image: no such file or directory

I also tried to inject the default push credentials from the build pod into the build context:

---
kind: BuildConfig
apiVersion: build.openshift.io/v1
metadata:
  name: skopeo-push
spec:
  source:
    dockerfile: |
      ARG PUSHCREDS="/var/run/secrets/openshift.io/push/.dockercfg"
      FROM quay.io/skopeo/stable:latest
      WORKDIR /src
      COPY "${PUSHCREDS}" .
      RUN cat /src/.dockercfg
      RUN skopeo copy --dest-authfile=/src/.dockercfg --dest-tls-verify=false docker://registry.fedoraproject.org/fedora:latest docker://image-registry.openshift-image-registry.svc:5000/default/built-images:skopeo-push
  strategy:
    dockerStrategy: {}

This seems like it worked, but it didn't. The creds file is not populated even though the COPY step suggests that it is. Incidentally, the need for passing the credential path in a container arg is because if it's hard-coded, the following happens:

$ oc start-build skopeo-push --follow
build.build.openshift.io/skopeo-push-1 started
time="2022-02-11T18:14:58Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0211 18:14:58.090805       1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".

Pulling image quay.io/skopeo/stable:latest ...
Trying to pull quay.io/skopeo/stable:latest...
Getting image source signatures
Copying blob sha256:415d5a2b6f94adc8715963ca420254ccb866192b25d52ab49640532dee48dafa
Copying blob sha256:c6183d119aa8953fe2cb6351e8fb4aeeb770f86c1aef3d534f7e02f5e2861321
Copying blob sha256:4b49ec78acac3b96cf4d327d9aa81cebbf84b7ed67090d8c3c9224546349b947
Copying blob sha256:cd62ba9735660431f88ce91edd75c1715ca188ad2a0b52998ad4d4fbaa130e5e
Copying blob sha256:f96511a93512217c0cb253a7e0f4f049ba12a33f9b84364189d0b3d1eec55d3b
Copying blob sha256:d845d72ee02bb0ef685b18cfd81f710f504d59bcab16400bcda7dedbab74f8dc
Copying config sha256:ee52ee4bc9a56351bbc284dd3ff8755c6bb771261184cad18fa27903bc758e59
Writing manifest to image destination
Storing signatures
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/6: FROM quay.io/skopeo/stable:latest
STEP 2/6: WORKDIR /src
--> 605165ed4b1
STEP 3/6: COPY /var/run/secrets/openshift.io/push .
error: build error: error building at STEP "COPY /var/run/secrets/openshift.io/push .": checking on sources under "/tmp/build/inputs": copier: stat: "/var/run/secrets/openshift.io/push": no such file or directory

Potential Workarounds with `cosa`

There is one more workaround that I think might work, but it is a bit hacky. Assuming we can get cosa build to execute inside a container build context (which we probably can; using raw_steps and adjusting the resources field will allow us to inject kvm into the image build pod!), we could have cosa drop the necessary files on the filesystem within the container build and generate a Dockerfile with all the labels and everything else set. We can then use that container as the input into a container build to inject the Dockerfile and the files into the build context.

This is somewhat similar to what openshift-ansible is doing in their configs.

cgwalters commented 2 years ago

It does not have support for the OpenShift custom build strategy

We could fix that right?

Assuming we can get cosa build to execute inside a container build context (which we probably can; using raw_steps and adjusting the resources field will allow us to inject kvm into the image build pod!), we could have cosa drop the necessary files on the filesystem within the container build and generate a Dockerfile with all the labels and everything else set.

That may be possible, but there's a major downside here that we lose all the control over the formatting of the tarball that ostree is doing in the export path. As well as needing to carefully replicate all the metadata we inject.

It feels to me like custom builder is the best way to go.

cgwalters commented 2 years ago

Also, any Dockerfile approach would defeat https://github.com/ostreedev/ostree-rs-ext/issues/69

cheesesashimi commented 2 years ago

We could fix that right?

I don't see why not since the limitation itself is part of ci-operator. It looks like there was some previous discussion and effort around this. See:

However, I can think of a couple of caveats, namely that OpenShift uses the built-in niceties the default Docker build strategies provide, such as resolving of ImageStreams, adding labels with commit and version info, and useful env vars to the built image (source).

Because of that, we may need to do something like use a custom image build, then pass the built image to a default Docker build step that does something like FROM <custom built image ref> so that it gets the labels. However, we should be mindful that this could overwrite the labels cosa sets, assuming there are any name collisions between them.

cheesesashimi commented 2 years ago

Given the above, which is not likely to change any time soon, there is one path forward that I can see:

It wouldn't be much different in principle than what our Jenkins job is currently doing. We could create another Imagestream in the rhcos-devel namespace and, after setting up some secrets, we could push the OCI images built by this repo into that Imagestream. I know it wouldn't buy us the container-native testing mechanism we're hoping for, but it could provide a stop-gap so we can get nightly builds running on these images.

I briefly envisioned a way we could push the image to the rhcos-devel namespace and then import the images into the OpenShfit CI-native testing mechanism by making use of the base_images: mechanism coupled with an images: build target. However, an issue we'll run into is that we'd need to use a mutable tag (e.g., :latest) so the config in openshift/release can remain static. On the surface, this doesn't seem like a problem, but if multiple PRs are opened to this repository simultaneously, the images could stomp on one another.

Another thing to consider is that building an image (images:) and testing it (tests:) can take place across multiple OpenShift clusters.

cgwalters commented 2 years ago

On the surface, this doesn't seem like a problem, but if multiple PRs are opened to this repository simultaneously, the images could stomp on one another.

Hmm I think we need to clearly separate PRs from any "periodics". It'd be bad if PRs could affect anything global.

Since with Prow, each PR gets its own CI namespace, it'd be really nice if we can just reuse that.

cheesesashimi commented 2 years ago

Hmm I think we need to clearly separate PRs from any "periodics". It'd be bad if PRs could affect anything global.

That should be pretty straightforward by using a different mutable tag for periodics. I assume the convention is to create another YAML file called openshift-os-master__periodic.yaml and set it up with a cron statement?

Since with Prow, each PR gets its own CI namespace, it'd be really nice if we can just reuse that.

It really would. To be more accurate, each CI job runs in its own namespace. At present, we have three jobs:

pull-ci-openshift-os-master-images
pull-ci-openshift-os-master-build-test-qemu
pull-ci-openshift-os-master-validate

The images job is a standard OpenShift image build. The build-test-qemu and validate jobs come from tests. They all execute in their own namespace with their own Imagestreams. What I don't (yet) know is whether one can push to the namespace Imagestream inside one of the test jobs; we can't do it from image builds because we can't access the creds. I foresee two issues there:

The service account which it runs as might not have permission to push to the namespace Imagestream.
Assuming we can push to that Imagestream, what happens afterward? Does it reconcile all the Imagestreams before it tags them into stable? If not, that means the image will eventually disappear.

cheesesashimi commented 2 years ago

The service account which it runs as might not have permission to push to the namespace Imagestream.

The short version is that tests run as the default service account, which does not have system:image-pusher permissions by default. images run as the builder service account which does have system:image-pusher permissions, but because we're using the Docker build strategy, we can't get the creds for it.

So in short, it looks like we'll have to push to the rhcos-devel namespace instead of the imagestream in the CI namespace.

petr-muller commented 2 years ago

It really would. To be more accurate, each CI job runs in its own namespace.

No, they don't. Jobs with same inputs (but different targets: like individual jobs for a single revision of a PR) execute in the same namespace and reuse built artifacts, like images. Check out these three runs:

All run in ci-op-13y3vvz2 namespace and share built artifacts.

cheesesashimi commented 2 years ago

Ah! I might've been looking at different runs or re-runs of one of those jobs. This is good to know, thanks!

cgwalters commented 2 years ago

Jobs with same inputs (but different targets: like individual jobs for a single revision of a PR) execute in the same namespace and reuse built artifacts, like images.

Right. And I think this is part of the problem with doing things outside of Prow's knowledge, because it can cause things to incorrectly leak across jobs, right?

But the larger point about isolating PRs from global state I think stands.

petr-muller commented 2 years ago

Right. And I think this is part of the problem with doing things outside of Prow's knowledge, because it can cause things to incorrectly leak across jobs, right?

Yeah. ci-operator has some assumptions over what is happening in its temporary namespace, because it assumes it is the only thing that manages resources in that namespace. I think any solution here should be targeted towards implemeting some kind of special build like mentioned in https://github.com/openshift/os/issues/600#issuecomment-1036704108.

It does not necessarily need to be backed by an OCP Build resource but ci-operator would need to learn how to manage such shared state between jobs (with Builds it is achieved simply by one job -- the first one to arrive -- creating the Build resource and the remainig ones just discover it already exists and wait for its result). We would need similar guarantee for whatever is the solution here.

cheesesashimi commented 2 years ago

I think any solution here should be targeted towards implemeting some kind of special build

Completely agreed and I say this from the perspective of someone who previously worked on a centralized CI team: Any of the workarounds I've been looking at (except for the FROM oci-archive:image.oci one) would cause us more pain in terms of maintenance, etc. as well as a general deviation from the paved road ci-operator provides. I assume CoreOS are the only ones wanting more control over a build like this, correct?

@cgwalters In the meantime, I think we should:

Leave the PR jobs as-is for now.
Add a periodic job that does everything the PR jobs do and pushes the final result to the rhcos-devel namespace.
Once something like the custom build strategy (or equivalent) is available, we can look into adopting a more image-centric CI process.

petr-muller commented 2 years ago

I assume CoreOS are the only ones wanting more control over a build like this, correct?

Yes

cgwalters commented 2 years ago

This was done!

openshift / os