paketo-buildpacks / java

A Cloud Native Buildpack with an order definition suitable for Java applications
Apache License 2.0
111 stars 26 forks source link

Building image downloads JDK at every run #86

Closed uqix closed 3 years ago

uqix commented 3 years ago

It would be nice to have some caching mechanism!

from https://github.com/spring-projects/spring-boot/issues/24262

uqix commented 3 years ago

It seems that spring-boot:build-image -Dspring-boot.build-image.pullPolicy=IF_NOT_PRESENT solves this problem.

uqix commented 3 years ago

Problem still exists: downloads at every run

[INFO]     [creator]         Downloading from https://github.com/bell-sw/Liberica/releases/download/8u275+1/bellsoft-jre8u275+1-linux-amd64.tar.gz
[INFO]     [creator]         Downloading from https://github.com/cloudfoundry/jvmkill/releases/download/v1.16.0.RELEASE/jvmkill-1.16.0-RELEASE.so
uqix commented 3 years ago

See https://github.com/spring-projects/spring-boot/issues/24556

uqix commented 3 years ago

dependency-mappings binding solved my problem, FYI:

https://paketo.io/docs/buildpacks/configuration/#bindings

https://paketo.io/docs/buildpacks/configuration/#dependency-mappings

snicoll commented 3 years ago

@nebhale can we please reconsider reopening this issue? We've got several reports around the same theme.

dsyer commented 3 years ago

Better to ask @ekcasey ? I think she had some ideas.

Azbesciak commented 1 year ago

I also have the same issue for building the spring-boot 3.0 native image. I added --pullPolicy=IF_NOT_PRESENT, it sometimes work, mostly does not, even without any gradle cleaning

uqix commented 1 year ago

At first we solved this problem by dependency-mappings which was cumbersome to upgrade, so we just set up the proxy server with cache to speed up the downloads instead.

dmikusa commented 1 year ago

There is a caching mechanism in the buildpacks. If you build an application, then rebuild it, the second time should not download things like the JVM. That should be cached. The cache is stored in a docker volume, if you're building with Spring Boot build tools or pack.

That said, the cache does not extend across applications so if you change the application image name (a common cause of this is appending a build number or version number to the image name) then it's like building from scratch and there is no cache (cause it's scoped by the application image name). It also does not start caching until you've had a successful build, so if you build, it fails and rebuild, the rebuild will need to download again.

Having said this, if you're seeing behavior to the contrary, please open a new issue and we can look into it.


A note on further caching. Dependency mappings or a proxy are a good way to cache things on a broader scale. Dependency mappings are a pain, so much so, I made a tool to make managing them easier. With the tool, you can bt dm <buildpack>, ex: bt dm paketo-buildpacks/bellsoft-liberica, and it'll download the binaries and make dependency mapping bindings for you.

Azbesciak commented 1 year ago

@dmikusa I supposed so, need to check but honestly, the image name should be the same. I have git tag in the name, but I noticed that it downloads each time, even when it does not change. Anyway, maybe it is possible to run the build inside a docker builder image? I would use that for gitlab-ci, and also that would be smarter to use the builder in that case as an environment, instead of a delegate.

BTW That spring task uses docker anyway, so why just not to check if the given builder image is already present in the local repository? Or to use gradle/docker repository for that? Sorry if I am simplifying it all, but was under impression that the dependencies caching problem is already solved

Azbesciak commented 1 year ago

@dmikusa What do you think about it? image image

image I suppose it should not look like this. I build only one image at a time.

in general when I build 10-20 times I need to restart my laptop, otherwise I have 24 GB RAM usage of Vmmem and even docker shut down does not help.

Regarding code changes, I have only changes in my source code, but without changes in the image name - at least on my side. The builder was still downloaded. I changed Spring beans because it was failing in the runtime, so maybe it impacted AOT hints for graalvm - no idea.

dsyer commented 1 year ago

It should not look like that, but it doesn't seem relevant to the current issue (#86).

On the subject of JDK downloads, it is indeed depressing that we are still having this discussion in 2023. AFAIK the buildpack spec has to change to enable them to cache JDK binaries between all builds (not just successful builds of the same image). Why that hasn't happened, or why some other acceptable workaround has not emerged, is a complete mystery. The local proxy with a cache for the download is the best workaround BTW, but it's fiddly to set up and not even possible for some users.

For me, personally, build times improved when the downloads started getting faster independently (either the origin of the JDK changed, or something in the internet shifted and opened up a faster path for me). But that doesn't really help the large number of users stranded on a relatively slow connection, or in a geographical location that happens to be different to the build pack authors.

Azbesciak commented 1 year ago

Guys, it anything planned on that issue? Maybe at least docker images with embedded jdk? Then docker would manage caching

Azbesciak commented 1 year ago

Anything :)?

dmikusa commented 1 year ago

Sorry, I'm not sure what you proposing @Azbesciak

The current level of caching is not ideal and as has been mentioned over and over above, we want to do better but are limited by what we can do based on the Cloud-Native buildpacks spec and lifecycle. It would be more appropriate to ask questions about this under https://github.com/buildpacks or on the CNB Slack channel.

We have done the best we can for now at the Paketo-level.

  1. Downloads are cached and should only happen once per application image (i.e. it's scoped to the app image name). If you change your app image name, it will re-download dependencies :( If you're not seeing this, there could be problems with your setup, and you should open a Discussion thread here.

  2. You can pre-fetch downloads and host them on your local machine. Then map these into the builds. This allows you to use file:// URLs to your pre-fetch downloads when you perform builds, and this will work across image builds. There is a tool to help pre-fetch dependencies and create the required binding files. With the tool, you can bt dm <buildpack>, ex: bt dm paketo-buildpacks/bellsoft-liberica, and it'll download the binaries and make dependency mapping bindings for you. Then all you need to do is volume mount the location where you have bt storing dependencies into the build container, or use the shell integration that bt provides.

There is future work to revamp how we are managing dependencies in Paketo, but the purpose of this is not to address caching issues. Again, the caching limitations need to be addressed with the CNB in the spec and lifecycle. We've really done as much as we can without changes upstream.

dsyer commented 1 year ago

@dmikusa thanks for the detailed update. Would it be possible to link directly to some open issues in the CNB space? Or highlight which parts of the spec need to change? All we see as users is frustrating downloads of stuff that seems eminently cacheable. There seems to be no way to break this cycle of finger pointing and actually make progress.

dmikusa commented 1 year ago

@dsyer This is probably the closest we got to improving this situation, here, but the RFC there has not be implemented because of concerns raised after it was approved. See the note at the top.

I don't think that we need the full RFC 0073 to solve the problems we're having with downloads. Another lighter-weight approach was pitched here. It has some issues yet to be worked out. First, writable caches are hard because you can have multiple buildpacks running at the same time and you also have to worry about security (cache poisoning). Second, things get more complicated from the angle of platforms like kpack that are multi-tenant. I think it could be possible that this RFC is moved forward still, but it would need some support at the CNB level, it's stalled because the individual that was helping push this forward left the project.

If someone has the time and motivation to push this forward, I'd suggest getting involved on the CNB Slack and if possible joining one of their working group meetings. https://github.com/buildpacks/community#meetings I don't think there are any project-level objections to fixing this issue but the effort needs a champion to push it forward.

Azbesciak commented 11 months ago

I cannot make it work, and also configuration of squit to cache https requests (I know there are couple of tutorials, but none of them works for me when I place squit into docker). Cannot it be just possible to instead downloading tar.gz, to fetch docker image? I am talking about it

BellSoft Liberica NIK 17.0.7: Contributing to layer
[creator]         Downloading from https://download.bell-sw.com/vm/22.3.2/bellsoft-liberica-vm-core-openjdk17.0.7+7-22.3.2+1-linux-amd64.tar.gz
[creator]         Verifying checksum
[creator]         Expanding to /layers/paketo-buildpacks_bellsoft-liberica/native-image-svm

there is an explicit docker image I suppose, right?

bellsoft/liberica-native-image-kit-container:jdk-17-nik-22.3.2-musl

dmikusa commented 11 months ago

@Azbesciak please create a new discussion thread here and explain what you're doing in detail. Thanks.

https://github.com/orgs/paketo-buildpacks/discussions/categories/java-team