Closed Prashanth684 closed 6 months ago
/test okd-scos-images
Thank you for your PR. I understand this is WiP, but when "consolidating" this, one thing to keep in mind should be that access to external repos (e.g. git clone
by make update-tuned-submodule
) will not work for NTO builds in locked down environments (ART pipeline).
@Prashanth684: all tests passed!
Full PR test history. Your PR dashboard.
Thank you for your PR. I understand this is WiP, but when "consolidating" this, one thing to keep in mind should be that access to external repos (e.g.
git clone
bymake update-tuned-submodule
) will not work for NTO builds in locked down environments (ART pipeline).
Thanks @jmencak for looking over this. The context for this change is https://issues.redhat.com/browse/OKD-210, where we are looking to build all OKD content on top of centos stream. As part of that change, ART had raised this PR which adds OKD configs based on centos stream. The source of truth for these configs is the ocp-build-data repo. In that repo, the dockerfile for NTO is specified as Dockerfile.rhel9, which is why we thought it would be better to consolidate, so that the autogeneration works for future releases. That said - I might need to understand a bit more about what is different for OKD. IIUC, is it:
Thanks @jmencak for looking over this. The context for this change is https://issues.redhat.com/browse/OKD-210, where we are looking to build all OKD content on top of centos stream. As part of that change, ART had raised this PR which adds OKD configs based on centos stream. The source of truth for these configs is the ocp-build-data repo. In that repo, the dockerfile for NTO is specified as Dockerfile.rhel9, which is why we thought it would be better to consolidate, so that the autogeneration works for future releases.
Thank you for additional context @Prashanth684 . The Dockerfile
s in this repository do need some cleanup. At a minimum, in my view:
Dockerfile.rhel8
should be safeily removed nowDockerfile.rhel<number>
and just go for Dockerfile.rhel
. RHEL versioning should be done using by git inside the content of the file. I understand that there are repos that reference Dockerfile.rhel9
. For that, we could just symlink Dockerfile.rhel9 -> Dockerfile.rhel
and after the PR merges go through the repos and s/Dockerfile.rhel9/Dockerfile.rhel/
. This is the pattern many core operators use. We could also use something else, such as s/Dockerfile.rhe9/Dockerfile.ocp/
.That said - I might need to understand a bit more about what is different for OKD. IIUC, is it:
- patch the plugin bootloader file
The patch should go. There shouldn't be any patches to TuneD
in this repo. This is a "heritage" before we started using TuneD
Fast Datapath (FDP) RPM package delivery. In my view, the Dockerfile
gives a chance to developers to build the project completely together with custom TuneD
changes and build all the TuneD
rpm packages. RHEL/CentOS stream does not ship all of them! TuneD FDP releases, on the other hand, do. As you probably cannot use TuneD FDP packages in CentOS stream, you have to build them. The botom line, feel free to remove that patch in this PR. If not, we will remove it in the future.
- different tuned settings for rhcos vs centos?
We don't do any of those at the moment to my knowledge. If different tuning is needed, this could be handled at TuneD
profile level adjusting their content.
Based on this, do you see any issues if we consolidate?
The proposed change complicates the Dockerfile.rhel9
a lot and makes it hard to read. Therefore, I'm not in favour of this change at the moment. I'm not sure I fully understand the motivation, i.e. why OKD cannot just use the uptream Dockerfile
.
Also is the submodule update currently being used, i.e is it built with a different submodule today for origin vs ocp?
Do you mean the [submodule "assets/tuned/tuned"]
? If so, OCP does not use the submodule at all. I'm not familiar with origin. Origin uses it if it builds NTO via using Dockerfile
.
@jmencak -
The proposed change complicates the Dockerfile.rhel9 a lot and makes it hard to read. Therefore, I'm not in favour of this change at the moment. I'm not sure I fully understand the motivation, i.e. why OKD cannot just use the uptream Dockerfile.
In order to support the entire organization building OKD artifacts, the OKD team must rely on automation to accomplish most of the work. The automation being designed is based on OCP ART metadata (e.g. https://github.com/openshift-eng/ocp-build-data/blob/openshift-4.16/images/cluster-node-tuning-operator.yml#L23-L26 ). We can use this metadata to generate, and keep in sync, the hundreds of configurations necessary to produce OKD with the lowest possible impact to OCP teams.
The cluster-node-tuning operator appears to be one of the few where this process will not work. Dockerfile
contains 3 FROM
stages, while Dockerfile.rhel9
(and ART metadata) defines two. We lack the necessary metadata to programmatically create valid OKD build configurations.
The simplest way to address this and support the OKD team is to unify the Dockerfiles. Once that is done, the same automation that supports the rest of the organization's OKD artifacts will keep your Dockerfile.rhel9 (or future unified variant) building and promoting compliant OKD images.
The next simplest approach is to ensure that the number of layers (and intention of those layers) is consistent between Dockerfile
and Dockerfile.rhel9
. This will permit the organizational automation to work, but exposes us to drift in the future (e.g. will the team remember to update one when they update the other).
The final option is to exclude NTO from the automation. This transfers the full responsibility of building OKD images and keeping them compliant to the NTO team (e.g. keeping up with centos based golang builders when they are available).
As I'm sure you can appreciate, anything other than full automation creates a tax for someone and risk for OKD. I hope that helps explain the motivation.
Thank you for additional context Justin.
The next simplest approach is to ensure that the number of layers (and intention of those layers) is consistent between
Dockerfile
andDockerfile.rhel9
This is where again, I'm a bit lost. Is this about the difference in the number stages (FROM
) or layers (RUN
) or both?
/cc @yanirq /cc @MarSik /cc @ffromani to share their views. My current concerns with this change:
Dockerfile
which is hard to read.FROM statements:
fwiw, the number of stages in a Dockerfile doesn't impact the final size of the image -- only the content of the final stage matters.
If it is the content in the Dockerfile, extracting this to a simple shell script (e.g. ADD base_os_prep.sh
, RUN base_os_prep.sh
) could encapsulate the differences more clearly.
The patch should go. There shouldn't be any patches to
TuneD
in this repo. This is a "heritage" before we started usingTuneD
Fast Datapath (FDP) RPM package delivery. In my view, theDockerfile
gives a chance to developers to build the project completely together with customTuneD
changes and build all theTuneD
rpm packages. RHEL/CentOS stream does not ship all of them! TuneD FDP releases, on the other hand, do. As you probably cannot use TuneD FDP packages in CentOS stream, you have to build them. The botom line, feel free to remove that patch in this PR. If not, we will remove it in the future.
Based on this comment @jmencak , it looks like the patch can be removed in Dockerfile
, but the FROM
statement still has to stay to build the tuned rpms for centos? (because FDP is not available for centos)
FROM statements:
* https://github.com/openshift/cluster-node-tuning-operator/blob/5a9355ec2c462d83812184c83deb209aa5762801/Dockerfile.rhel9 has 2 * https://github.com/openshift/cluster-node-tuning-operator/blob/5a9355ec2c462d83812184c83deb209aa5762801/Dockerfile has 3
So the number of stages (FROM
) causes the problem for the automation, not (RUN
) layers. Thank you for confirming this.
fwiw, the number of stages in a Dockerfile doesn't impact the final size of the image -- only the content of the final stage matters.
Yes, if you build the packages you need during a build stage, you'll have a smaller final image. That's why we're doing the 3 stage build. Every extra RUN
(layer) is usually bad news for the resulting image size. And there are a few of them in this PR. This PR also doesn't clean the dependencies needed for the build, so you'll end up with a huge CS9 image.
This is a quick experiment I did:
Size of NTO images prior this PR:
473MB: cs9
639MB: rhel9
Size of NTO images with this PR:
1.36GB: cs9-new
659MB: rhel9-new
So a huge increase in size for the CS9 NTO image, not so big for RHEL9, but there is still some.
I believe we need to limit the number of RUN
statements in the dockerfile. podman history
and podman inspect
should help. Also, at least removing the build deps for CS9 build could help reduce the resulting images size. I'm actually out sick (yesterday/today), so please bear with me if I don't respond quickly enough.
The patch should go. There shouldn't be any patches to
TuneD
in this repo. This is a "heritage" before we started usingTuneD
Fast Datapath (FDP) RPM package delivery. In my view, theDockerfile
gives a chance to developers to build the project completely together with customTuneD
changes and build all theTuneD
rpm packages. RHEL/CentOS stream does not ship all of them! TuneD FDP releases, on the other hand, do. As you probably cannot use TuneD FDP packages in CentOS stream, you have to build them. The botom line, feel free to remove that patch in this PR. If not, we will remove it in the future.Based on this comment @jmencak , it looks like the patch can be removed in
Dockerfile
, but theFROM
statement still has to stay to build the tuned rpms for centos? (because FDP is not available for centos)
Yes. Actually, I wonder if the way forward is to make the Dockerfile.rhel9
a 3 stage one. That would help reducing the size of the NTO OKD image and might even make the Containerfile more readable?
@jmencak - https://github.com/openshift/cluster-node-tuning-operator/pull/1058 gets the OKD image size down to 495MB. dnf build-dep tuned.spec -y
installs whatever it wants, so dnf --setopt=protected_packages= history -y undo 0
is used to reverse it once the RPM build is complete.
This alternative does away with most of them, but regarding 'RUN', CI (and therefore OKD builds) will always squash.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Prashanth684 Once this PR has been reviewed and has the lgtm label, please assign marsik for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.