Add Support for CRaC enabled JDK Distributions

bitgully commented 8 months ago

Description of Enhancement

The BP_JVM_TYPE variable only lets us choose between "JDK" and "JRE" at the moment. It would be nice to have a new variable or an additional value available for choosing a CRaC enabled JDK/JRE distribution. The buildpack should then download and use the CRaC enabled distribution instead of the standard version.

Possible Solution

Add a new boolean variable "BP_CRAC_ENABLED". Or add allowed values for existing BP_JVM_TYPE variable. Something like "JDK-CRAC" or "JRE-CRAC".

Motivation

We could keep using the existing buildpack whilst migrating to CRaC for snappy container start-ups.

dmikusa commented 8 months ago

Thanks for raising this up.

Presently, you could make this work by packaging up your own copy of the Bellsoft Liberica buildpack. You would first need to adjust sha256 and uri to point to the CRAC-enabled JVM you want, then package the buildpack. Lastly, you can consume your buildpack with the instructions in that link. Instead of specifying an alternative Paketo JVM buildpack, point to your image. i.e. docker.io/foo/my-buildpack.

Long term, I think we're waiting to see how folks want to use CRAC with buildpacks. We can certainly bundle and install a CRAC-enabled JVM, but there's more work that needs to happen for CRAC to be useful. In particular, you need to start and run the app to generate the checkpoint. How long the app runs and what it does while the checkpoint is being generated are unclear, and buildpacks would have limits if they were to attempt to automatically generate a checkpoint. Going further, that checkpoint needs to live somewhere. Buildpacks could put it into the image, but it's unclear if that's what people would want/expect.

As someone asking for this feature, we'd like to hear your thoughts on how you plan to use CRAC & what you'd expect buildpacks to do. That will help us create better support in buildpacks for this functionality. Thanks

bitgully commented 8 months ago

Thanks for the hint of packaging a custom buildpack. It would just be more sustainable having the mentioned support built-in.

I would like to use buildpacks as part of a more comprehensive pipeline (i.e. Tekton) for deployments on Kubernetes clusters. This pipeline consists of a sequence of steps.

For example:

Git clone of application source code
Build image using buildpacks (with CRaC enabled JVM)
Deploy newly built image as Kubernetes pod and warm-up
Set CRaC checkpoint
Add the checkpoint file in an additional layer to the previously built image

This could resemble the workflow in the build environment where the first start-up would still be slow. But all future deployments (e.g. staging or production environments) would benefit from the CRaC enabled image.

dmikusa commented 8 months ago

The trouble is that you can't do step 5.) there. Your app image is generated in step 2.) when you build with buildpacks. Once the image is written, you can't change it. OCI images are immutable (guaranteed via hashes).

You could build a new one with that information included, but it's a rebuild of the image that produces a new image.

So something like:

Git clone of application source code
Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info
Deploy newly built image as Kubernetes pod and warm-up
Set CRaC checkpoint & save the checkpoint info
Re-run buildpacks providing the checkpoint info. Buildpacks could then include that into the produced image.
Run your image w/CRaC info in K8s

The second build could be pretty quick because of all the caching that we do. It's extra steps though.

Another possibility is:

Git clone of application source code
Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info
Deploy newly built image as Kubernetes pod and warm-up
Set CRaC checkpoint & save the checkpoint info
Run your image w/out CRaC info in K8s but volume mount (or config map) the checkpoint info into the container where it can be used by the application.

This is one step less, but requires a volume mount and those can be a problem/non-starter for some users. The other option might be a config map, but I suspect the checkpoint info might be too larger for that.

and another is:

Git clone of application source code
Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info. Part of what the buildpacks does will start your application (maybe for X seconds?) and then it stops the process saving the checkpoint info. Checkpoint info is then included with the image the first time through.
Run your image w/CRaC info in K8s

This is the least steps/most automated but it is very difficult for buildpacks to start up a random app successfully. It might require resources not available like a service (DB, message queue, etc..). We might be able to get a little farther if we constrain the types of apps supported like if we only support Spring apps. In that case, we might be able to more reliably start the app but even then, you could still have issues with required services.

and yet another possibility is:

Git clone of application source code
Build image using buildpacks (with CRaC enabled JVM), but no checkpoint info. Install a helper tool that will run before the app to fetch the checkpoint info from somewhere (maybe an HTTP server?).
Deploy newly built image as Kubernetes pod and warm-up
Set CRaC checkpoint & save the checkpoint info
Run your image w/CRaC info in K8s. Before the app starts up, the helper tool runs and fetches the checkpoint info. The info is then available for the app to start ultra-quick.

I can see some advantages to this approach, but it has the drawback of needing work done before the app starts which takes time. Ultimately, CRaC is about starting the app super fast, so that it negates the benefits of it.

Anyway, I appreciate your thoughts and feedback. If anyone else comes across this thread, please add your feedback too.

bitgully commented 8 months ago

The two possibilities mentioned in between (second and third) seem a little hard to implement. Possibility 2 would require the creation (and deletion) of an additional Persistent Volume/ConfigMap for each version of the app in every environment. As far as possibility 3 is concerned: I'm afraid it won't be possible for the buildpack to start all containers as part of the build process in enterprise environments. Because they often depend on a number of other resources (ConfigMaps, DBs, Leader/Follower instances...) that might only be available in the namespace where they get deployed afterwards (e.g. by a Helm chart that provides these artifacts).

But I would like to pick up on your first and your last proposed possibilities. Let's name them "Heavy" (=first) and "Light" (=last) for now. The heavyweight option includes the checkpoint data in the app's image itself and the lightweight option fetches the checkpoint info separately at startup.

Heavy

Pros:

Easy to use/start later on.

Cons:

Two image builds necessary (second one is faster).
Large image size for future pulls (though layers get pulled in parallel).
App's state is bundled with the implementation/image.
- Checkpoint data can only be updated by rebuilding the image.
- Potential future removal of CRaC functionality would require a rebuild of the app image.

These cons are not present in the lightweight version. But "Light" comes with other challenges already mentioned.

Light

Pros:

Only one image build necessary.
Small image size for future pulls (checkpoint data must still be fetched in case CRaC deployment is desired).
App's state is separated from it's implementation/image.
- Checkpoint data can be updated without rebuilding the image.
- Potential future removal of CRaC functionality doesn't require rebuild of image (just rebase of run image).
Existing buildpack must only add option for CRaC enabled JDK versions. Everything regarding checkpoint creation/restore is handled by the build/deploy workflow.

Cons:

Checkpoint data must be published in addition to the app image.
Deployment of app using CRaC needs extra configuration.

The cons of the "Light" version could be dealt with like below.

Checkpoint Data Storage

Since the checkpoint data must be fetched on-demand during app deployment, it needs a place to be published at alongside it's corresponding image version.

Option A: We could leverage the OCI's "generic artifacts" feature that is already part of container registries. The build pipeline could use ORAS to push the checkpoint file next to it's image in the same location like so: oras push.

Option B: Alternatively, we could simply create two images for every application. One standard app image and one checkpoint image (same name but with "-checkpoint" postfix).

Fast App Start of "Light" Option on Kubernetes

The concern about the extra work needed for fetching the checkpoint data on-the-fly at startup might be addressed like this: It is assumed that many containers will be deployed on orchestration platforms like Kubernetes. Since containers inside a pod share the same filesystem, we could inject a sidecar container in addition to the regular app's container. The sole purpose of this tiny container is to fetch the checkpoint file from the same container image's location and save it to the shared filesystem (ephemeral "emptyDir"). This way, the app's image and the checkpoint file are pulled simultanously (using serializeImagePulls=false), assuming the startup delay would not be any longer than it already is today. The normal app's container can then boot and simply restore the checkpoint from the pod's local filesystem.

The sidecar container, that fetches the checkpoint data, should write an additional file (e.g. "checkpoint-fetch-completed") when there was either no checkpoint available or it's download was completed. The presence of this "checkpoint-fetch-completed" file can be checked at the app container's startup (e.g. override default entrypoint with some bash command) to prevent it from booting while the checkpoint is still being downloaded.

Conclusion

The disadvantages of storing app image and checkpoint data separately can be handled by existing Kubernetes functionalities. Even though this requires extra configuration, it leads to faster builds and allows for a more flexible update/removal of checkpoints or CRaC if need be.

I suppose both, the heavyweight as well as the lightweight, solutions come with some trade-offs but could work.

frederikz commented 3 months ago

The spring-boot buildpack has added CDS support where already a training run is performed which generates the CDS archive. This seems to me to be almost the same as supporting CRaC with a training run.

in bellsoft-liberica buildpack add support for choosing a CRaC enabled JDK
in spring-boot buildpack add options similar to CDS which performs a training run and contributes the generated checkpoint files

Same as with the CDS training run, you as a developer have to make sure that your spring-boot application can be started in a training run without external resssources.

As suggested I gave it a try and built my own bellsoft-liberica and own spring-boot buildpack. In the training run the checkpoint files couldn't be generated due to missing privileges. When I start a training run by hand I would pass --privileged to docker which is missing here. I don't know if something similar is possible when a buildpack is executed.

dmikusa commented 3 months ago

When I start a training run by hand I would pass --privileged to docker which is missing here.

What specifically do you set when you do this with a Dockerfile?

frederikz commented 3 months ago

Nothing special in the Dockerfile only when executing the Docker Image. Quoting from https://bell-sw.com/blog/how-to-use-crac-with-spring-boot-apps-in-a-docker-container/ :

docker run -d --privileged -v $(pwd)/storage/:/storage/ -w /storage --name petclinic-app-container bellsoft/liberica-runtime-container:jdk-21-crac-slim-glibc java -Xmx512m -XX:CRaCCheckpointTo=/storage/checkpoint-spring-petclinic -jar spring-petclinic-3.2.0-SNAPSHOT.jar

Please note the --privileged option, which is necessary for the correct CRaC and the underlying criu executable behavior.

dmikusa commented 3 months ago

Ok, that's what I was suspecting. I don't think that'll work with buildpacks because there is a lot of effort to run things as non-privileged users (we don't even run as root, let alone with a privileged flag). I'll ask around though and see what I can find out.

bitgully commented 3 months ago

From what I can remember, adding the PTRACE capability 'only' might also be sufficient: docker run --cap-add=SYS_PTRACE But I didn't verify this myself yet (plan was to use it on RHEL with SELinux).

The thing with CraC is, that you need to put in manual effort to exclude individual parts of the memory that shouldn't be projected into the checkpoint file (e.g. stage-specific properties/variables/URLs/password s). Since I couldn't come up with a reasonable generic solution to avoid this additional effort in each application yet, I didn't pursue the CraC option any further.

But last January, Azul Systems told me, they were working on providing their CraC enabled version as part of the Paketo Buildpack. Not sure if this was pursued any further, though.

paketo-buildpacks / bellsoft-liberica