Support creation of lean application "runtime" images

sbose78 commented 4 years ago

Clubbing https://github.com/redhat-developer/build/issues/40 https://github.com/redhat-developer/build/issues/39

Goal

As a developer, I want to separate build tools used to build the application from runtimes, so that I can create lean application images.

A Build would require three image designations in this case:

Builder image: containing the build tools related to the particular strategy and runtime (e.g. s2i-java-11)
Base runtime image: lean base image containing the runtime for the app (e.g. openjdk-slim)
Output image: the application image layered on the base runtime image (a.g. myapp) Where possible, the build strategy should have appropriate defaults for these images but allow user to override them.

Problem

S2I images on OpenShift contain both build tools and runtime dependencies which produce bulky images. The build tools are only needed when building the application and image and not needed when the image is deployed but they add a unnecessarily overhead to the size of the image.

Why is this important?

To enable developers to build lean runtime images for their application that only contain the application and the minimal dependencies it has at runtime

Solution requirements

The solution needs to be generic so that it usable by any strategy if the user provides the information on

runtime image base ( openjdk-slim, ubi, etc ) .
while files to copy from source built image.
where to copy the files to, in the lean output image.

zhangtbj commented 4 years ago

HI Shoubhik,

We also have the requirement, for buildpacks buildstrategy, we would like to use RedHat UBI base image to build the output image. I am not sure if this requirement can be addressed in this issue.

I think like the builder image, we should not ask end-user to set the builder image and base runtime image explicitly. But we should have the capability to choose the different base image we used for end user to build the output image.

Pls let me know if I misunderstand something.

Thanks!

sbose78 commented 4 years ago

Yes, a non mandatory spec.runtime section should be added. The controller must generate a TaskRun step on-the-fly to COPY X in image A to Y in image B.

qu1queee commented 4 years ago

@sbose78 from the problem you are trying to solve, is the problem related to bulky images = slower builds? just to get an idea on what to expect from the solution.

Some general questions:

For Base runtime image what would be the mechanism to auto-detect the specific runtime to use? based on the source-code. Will this be a new feature?
If I understand correctly, we need to pass some layers from Builder image into Base runtime image. For this, would a multi-stage build approach would make more sense? similar to https://docs.docker.com/develop/develop-images/multistage-build/

otaviof commented 4 years ago

I would like to share this proof-of-concept implementation with you, please consider. The idea is to be able to "reduce" a image footprint by copying over parts of a given image, to compose a new target-image.

Trying to illustrate this process in a single image: reduce-diagram

Therefore, this application works as build-operator helper tool, and receives parameters as all the other applications employed by BuildStrategy steps.

In the context of build-operator, we would have the following BuildStrategy step generated by the operator, and filling up the parameters the helper application will need. Also, Build CR would need extra attributes, to be passed along to helper tool.

sbose78 commented 4 years ago

@qu1queee ,

(1) The user needs to specify the base runtime image to be used in a spec.runtime block. So, yes this is a new features.

(2) Yes, this is actually a multistage build where the first stage isn't necessarily a Dockerfile build.

This is equivalent to a:

COPY --from=previously-build-image:latest /home/path/to/build/war /opt/war

It is called chained builds in Build v1 https://docs.openshift.com/container-platform/4.3/builds/advanced-build-operations.html#builds-chaining-builds_advanced-build-operations

sbose78 commented 4 years ago

I would like to start small & simple here:

Irrespective of which strategy is being used, here's my suggestion of what an initial PR should look like:

Our controller should generate a simple Dockerfile.runtime on-the-fly which only copies files from the 'output' into the new to-be-built image.
```
 FROM ubi-rhel8-runtime
 ...
 COPY --from=EXISTINGIMAGE:TAG /path/in/source/image /path/in/destination/image
 ...
```
Our controller should then add a TaskRun step on-the-fly to do a buildah bud or a kaniko step to build using the Dockerfile.runtime.

At the moment, we don't need copying over of labels, we could address that as part of https://github.com/redhat-developer/build/issues/50 .

I wouldn't introduce a new CLI at the beginning because it has an upstream & downstream delivery/maintenance overhead in addition to API versioning costs ( alpha, beta, GA ).

otaviof commented 4 years ago

I would like to start small & simple here:

Irrespective of which strategy is being used, here's my suggestion of what an initial PR should look like:
* Our controller should generate a simple `Dockerfile.runtime` on-the-fly which only copies files from the 'output' into the new to-be-built image.
  ```
   FROM ubi-rhel8-runtime
   ...
   COPY --from=EXISTINGIMAGE:TAG /path/in/source/image /path/in/destination/image
   ...
  ```

I think to explain this from a different perspective, we would like to have the controller to orchestrate how the build will be executed. And among the build steps, we would like to intervene, in order to create a "runtime" leaner image.

And I think you have a good point here. By generating a Dockerfile.runtime from the controller directly might be the shortest path to have a learner image.

* Our controller should then add a `TaskRun` step on-the-fly to do a `buildah bud` or a `kaniko` step to build using the `Dockerfile.runtime`.

For this step to work, it will be dependant on the strategy in use. With buildpacks, it can export images to a local Docker daemon, as per exporter (CNB) parameter:

$ docker run heroku/buildpacks:18 /cnb/lifecycle/exporter --help
[...]
  -daemon
        export to docker daemon
[...]

So, exporter command could materialize the image in a local Docker instance. However, we won't have it available, as we know already. Leaving us with a question, how to expose images that are being build to buildah or kaniko?

In the current scenario, that would only be possible after uploading the image to a container registry first, and then buildah/kaniko would be able to reach it.

To note, on strategies where the image is being build directly from a Dockerfile, then the approach suggested would work. My concern now lies on how to reach all strategies.

At the moment, we don't need copying over of labels, we could address that as part of #50 .

Okay, lets use #50 to dive in the labels use-case. 👍

I wouldn't introduce a new CLI at the beginning because it has an upstream & downstream delivery/maintenance overhead in addition to API versioning costs ( alpha, beta, GA ).

We don't necessarily need to introduce a new CLI. If we decide to adopt a helper application to deal with use-cases like image format conversion, and reading cache for supported builders, we can use the same repository than the operator itself. And consequently, it would be GA at the same time than the operator.

But to assume, I might be rushing on the solution before we can explore the use-case as a whole. So, I'll keep this suggestion for a later time, if I may. :-)

sbose78 commented 4 years ago

For this step to work, it will be dependant on the strategy in use. With buildpacks, it can export images to a local Docker daemon, as per exporter (CNB) parameter:

So, exporter command could materialize the image in a local Docker instance. However, we won't have it available, as we know already. Leaving us with a question, how to expose images that are being build to buildah or kaniko?

I was kinda thinking that we could let the image push to the registry happen as is, the way it happens today, and then

we pull it back ( COPY --from=quay.io/sbose78/myoutputimage... )
rebuild it on top of the runtime input base
and then push it back again to the same resgistry/tag

This should make things strategy agnostic ? What do you think?

sbose78 commented 4 years ago

We don't necessarily need to introduce a new CLI. If we decide to adopt a helper application to deal with use-cases like image format conversion, and reading cache for supported builders, we can use the same repository than the operator itself. And consequently, it would be GA at the same time than the operator.

+1, I've given this a thought too, don't get me wrong, I'm not ruling out helper applications :) Just want to see how far we can get without introducing that. The successes and failures might lead us to design the helper app better in future if/when needed.

otaviof commented 4 years ago

I was kinda thinking that we could let the image push to the registry happen as is, the way it happens today, and then
* we pull it back ( `COPY --from=quay.io/sbose78/myoutputimage...` )

* rebuild it on top of the runtime input base

* and then push it back again to the same resgistry/tag
This should make things strategy agnostic ? What do you think?

Lets give it a try. I'm going to open a draft PR with this idea :-)

qu1queee commented 4 years ago

following this discussion since a while. Looking forward for an initial simplistic PR as a first step.

sbose78 commented 4 years ago

FYI, in BuildConfig, it was expressed as

apiVersion: v1
kind: BuildConfig
metadata:
  name: image-build
spec:
  output:
    to:
      kind: ImageStreamTag
      name: image-build:latest
  source:
    dockerfile: |-
      FROM jee-runtime:latest
      COPY ROOT.war /deployments/ROOT.war
    images:
    - from: 
        kind: ImageStreamTag
        name: artifact-image:latest
      paths: 
      - sourcePath: /wildfly/standalone/deployments/ROOT.war
        destinationDir: "."
  strategy:
    dockerStrategy:
      from: 
        kind: ImageStreamTag
        name: jee-runtime:latest
  triggers:
  - imageChange: {}
    type: ImageChange

otaviof commented 4 years ago

Please consider this gist, it contains the generated Dockerfile.runtime inline, and it does work. So, we can use the container registry as a intermediary point, and upload and re-upload the build.output.image.

I'm chancing the operator API resources, and improving the generation of Dockerfile.runtime and other resources you see in the example gist, for a PR later on.

otaviof commented 4 years ago

Having PR #263 merged, could we also close this issue as well?

shipwright-io / build