quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.64k stars 2.65k forks source link

Consider creating images with prepopulated m2 #6736

Open geoand opened 4 years ago

geoand commented 4 years ago

When using s2i, tekton or more generally any sort of process that builds docker images using Maven or Gradle, it usually takes a very long to populate an empty .m2 repository the first time the image is built. It would be nice if provide a base image that along with the build tools would contain a pre-populated .m2 repository.
The idea is that such an image would be updated for each Quarkus version thus making image building of final application images much faster.

For reference, Halkyon already does something similar: https://github.com/halkyonio/container-images/tree/master/maven-offline-repo

geoand commented 4 years ago

cc @cescoffier @maxandersen @tqvarnst

gsmet commented 4 years ago

It's a good idea. We need to be extra sure the content is safe so we should enable all the checks that we can (checksum for instance, signatures would be even better but I'm not sure if Maven does that automatically).

geoand commented 4 years ago

Great point @gsmet

maxandersen commented 4 years ago

maven has checksum integration builtin - you have to enable it though using -C or --strict-checksums.

that said I really don't like the idea of prepopulating images with .m2 content as it should really not be the job of the image to cache these and we would need two different ones to deal with that one would be using maven central the other should have repository.redhat.com and what about users that works under the policy they only will use maven dependencies served from their own proxy servers - this kind of image would completely circumvent/blindside such efforts.

Ultimately the mechanism to use for this is a close-to-build-cluster dependency cache but we've been waiting years for that to be more easily available.

My understanding is that tekton are planning to (if they don't already have it) to have tasks similar to what github caching does so you have a unified approach on caching just the content your build actually needs. I would say we look into that first.

having prepopulated docker images does though have benefits for some usecases but would it not be better to simply document how to enable that for users/customers who are okey having their image be 500 MB+ bigger by default to just resolve possible 100MB's of dependencies quicker during the build ?

gsmet commented 4 years ago

On Thu, Jan 23, 2020 at 10:19 AM Max Rydahl Andersen < notifications@github.com> wrote:

maven has checksum integration builtin - you have to enable it though using -C or --strict-checksums.

Yeah I know about that one. I was talking about signature checking support. I don't know if they have something.

cmoulliard commented 4 years ago

500 MB+ bigger by default to just resolve possible 100MB's of dependencies quicker during the build ?

Without such maven offline repo, every maven build executed within the pod will initiate a new download of the maven artifacts. Even if creating an image packaging the maven repo is not the perfect approach, that at least boosts the process to demo our technology on k8s/openshift.

maxandersen commented 4 years ago

I fully get that @cmoulliard - I'm talking about that right now the resulting images keeps that state in them and that due to how maven dependency managers works and are used us "injecting" these bits into the images are breaking expectations.

If the suggestion is to have some additional images that adds this extra layer on top of the leaner images ...sure - but it wouldn't be recommended approach for real usage would it ?

geoand commented 4 years ago

Yeah, the idea is to have additional images with this, not as a replacement to anything else.

maxandersen commented 4 years ago

then I got no problem - waste MB's and cpu time away to speed up the demos :)

cmoulliard commented 4 years ago

Questions:

maxandersen commented 4 years ago

parent images - good question; I'm still battling getting any overview what the proper base images are so can't help there (yet).

if there are common conventions for it then sure - but I would assume maven repo would be in where default user in the image sees ~/.m2 to be simple ?

cmoulliard commented 4 years ago

he image sees ~/.m2 to be simple ?

We can adopt this convention.

Remark : This point should be discussed with OpenShift S2i Team (and also Runtimes) as a different path was defined for Java S2I image : https://github.com/fabric8io-images/s2i/blob/master/java/images/centos-java11/s2i/s2i-setup#L4

cmoulliard commented 4 years ago

parent images - good question; I'm still battling getting any overview what the proper base images are so can't help there (yet).

As the layer added to contain the maven artefacts/gavs will not be supported as a product by Red Hat, then we could create such layer from the UBI image ... Remark : To be also discussed with Runtimes ;-)

cmoulliard commented 4 years ago

@maxandersen Do you agree to use the following Dockerfile to build the Quarkus offline repo

FROM TBDEFINED

USER root

ADD pom.xml pom.xml

RUN mvn de.qaware.maven:go-offline-maven-plugin:1.2.1:resolve-dependencies -f pom.xml -Dmaven.repo.local=/tmp/artefacts

RUN rm pom.xml
maxandersen commented 4 years ago

looks fine - i assume the pom.xml is a pom listing all "root" dependencies?

Sanne commented 4 years ago

N.B. Gradle doesn't use ~/.m2 by default, it has a separate cache which has a more advanced structure.

A gradle build script can choose to consume the Maven default by using mavenLocal(), but there's drawbacks associated to it so I would rather not have our users forced to do so.

maxandersen commented 4 years ago

@Sanne none of this is for our users...its for having images that build faster for demo purposes only.

Besides the mismatch of gradle there are a bunch of other issues like ignoring users maven proxies and artifact policies etc. (see https://github.com/quarkusio/quarkus/issues/6736#issuecomment-577594878)

cmoulliard commented 4 years ago

pom listing all "root" dependencies?

This is a pom file created from n example and containing the core extensions/modules and plugin needed to build/run the Application. Example : https://github.com/halkyonio/container-images/blob/master/maven-offline-repo/pom.xml

jorsol commented 3 years ago

More than providing an image with prepopulated m2, my use case is more "normal"... CI already has dependency cache features, so using de.qaware.maven:go-offline-maven-plugin:1.2.8:resolve-dependencies works great, EXCEPT that it doesn't resolve the dynamic dependencies that quarkus-maven-plugin loads, so it's not possible that the next run is in offline mode.

So, one question would be, how can be archived? would make sense to have a kind of resolve-dependencies mojo in quarkus-maven-plugin?