[GR-48103] Native Image Layers

christianwimmer commented 1 year ago

TL;DR

Allow users to create native images that depend on one or more base images. Such application images are much faster to build compared with standalone images, providing an improved development experience. Moreover, base images can be shared not just across applications but potentially also across different operating system processes, which can reduce the overall memory footprint.

Motivation

Up until today, users can only build standalone native images, which are single executables (or shared libraries) containing all machine code and an object heap for a given application. While standalone images can benefit from closed-world optimizations, the build step always needs to analyze and compile the full application, including its dependencies, the JDK, and the Substrate VM runtime system.

Layered native images will work similarly to image layers in the container world: Certain components of an application, for example a microservices framework and dependencies, can be turned into one or more base images. On top of such base images, application images can be built. Such images are much smaller and faster to build compared with standalone images because only the application code needs to be recompiled. Everything that is part of the base images can be re-used and does not need recompilation.

Although base images cannot benefit from closed-world optimizations anymore and will thus be larger in size, they provide another benefit: Since multiple applications can use the same base images, base images can be shared, at least partly, across operating system processes, which can reduce the overall memory footprint when deploying multiple application images on the same machine.

Goals

Reduce image build time for the application (not for the layered images themselves, or the total time necessary to build all layer dependencies): Allow incremental building of native images where a base image for the core dependencies of the application is pre-built and for application code changes, only the application code needs to be considered.
Reduce the combined RSS memory size when many different applications that share common base frameworks and libraries run on the same machine.
Provide a memory efficient and secure integration into GraalOS, but also work outside of GraalOS (at first on Linux, eventually also on MacOS and Windows).
Guarantee the same programming model restrictions as standalone images. We want to be able to "upgrade" a layered application image to a standalone image in GraalOS when that application is frequently used.

Non-Goals

Layered images are no general purpose standard shared libraries that can be used from other languages (like from C).
No interchangeability of images: We do not allow building different base images and mix / match different such images at run time. When building an application, the base images it depends on are fixed at build time. When a base image is rebuilt, all applications that depend on the base image need to be rebuilt too. This includes minor bugfixing releases of a framework or the JDK.
No overall build time improvements: We accept that the sum of the image build time of the base images and the application image will be higher than the current overall build time.
No overall image size improvements: The size of the base images plus the application image will be higher (probably 2.5x to 5x) than the current image size.

Success Metrics

Reduce the binary size of a Spring/Micronaut/Quarkus application image to at most a few MBytes.
Reduce the total RSS of all applications when many applications are running because machine code and image heap of base images can be shared between different applications.
Keep the peak performance acceptable, at most ~20% reduced peak performance (compared to a standalone native image). Some loss of peak performance will be inevitable because closed-world optimizations are no longer possible.

Technical Details

Terminology

The current approach of building a single image that contains the application, libraries, the JDK, and the runtime system is called a "standalone image".

In the new approach, an application is split up into multiple "layered images", with a linear chain of dependencies. The layered image that one particular image is based on is called its "base image". Due to the linear chain, a base image can have a base image itself. An image that contains (at least) the java.base module of the JDK and the Substrate VM runtime system cannot depend on a base image. The layered images that depend on one particular image are called "extension images". Extension images can have extension images themselves. A leaf image for an actual application, i.e., an image that can actually be executed, is called an "application image". Slightly different rules (like doing closed-world optimizations) apply to application images because they are no longer extensible.

Workflow

Input when building a layered image:

The classpath of the library/framework (or empty when building a layered image that only contains JDK classes).
A list of packages/classes that should be available in the layered image. All transitive dependencies from those packages/classes are included in the layered image too.
A single base image that the layered image depends on (the base image itself can depend on another base image, allowing a linear list of base image dependencies)

Output when building a layered image that is not an application image:

The actual layered native image, containing the machine code and the image heap. This native image is not a valid shared library according to the normal OS linking and ABI, i.e., it cannot be used from anything else than native images that are based on it.
A bundle file that contains additional information necessary to build extension images. In addition to the .jar files (for consistency checking), the bundle contains information about the image heap and the Graal IR graphs of the classes included in the layered image.

The application image itself that depends on other layered images needs to be a valid application that can be loaded by the OS.

Programming model

We want to be able to "upgrade" a layered application image to a standalone image in GraalOS when that application is frequently used. This means that a layered application image must behave exactly the same way as a standalone image, i.e., have the same restrictions regarding reflection and resource usage. We cannot just make everything available for reflection, or include all resources, in base images - if over-registration happens in the base image (to have more sharing for different applications), then we must have a filtering at run time that still adheres to the full restrictions that a standalone image has.

This also applies to features configured via options. For example, we might over-configure a base image to be prepared for all URL protocols or all locales, but options such as --enable-url-protocols=http,https,ftp,jar and -H:+IncludeAllLocales can only be provided by the final application build, so the over-configured protocols and locales from the base image must not be visible to the application.

Some major changes to the programming model of Native Image are necessary: we must no longer rely on reachability information during the image build. All places where reachability information is exposed to the user must be changed:

Currently, conditional configuration based on the reachability of classes is supported in reflection and resource configuration files. This needs to be deprecated, and replaced with a conditional configuration based on "class has been reached at run time". The information which code has already been executed at run time is stable between layered images and standalone images, i.e., even layered images include more elements than necessary at image build time, the run-time behavior is still the same as a standalone image.

We also need to deprecate (or at least discourage using) the parts of the Feature mechanism where users can write code that runs at image build time and reacts to classes, methods, or fields becoming reachable.

SergejIsbrecht commented 1 year ago

@christianwimmer

could you elaborate on following statement?

We also need to deprecate (or at least discourage using) the parts of the Feature mechanism where users can write code that runs at image build time and reacts to classes, methods, or fields becoming reachable.

For us Features are vital to register classes for reflection and vice versa. Deprecating/ Removing it would probably kill out GraalVM native-image use-case, because it is impossible to auto generate reflection configs for our application, because not all code paths are reachable, even during runtime, but they are vital for us, because they might be reachable under certain conditions. Therefore I would appreciate not changing anything in this regards, at least for a standalone image

christianwimmer commented 1 year ago

For us Features are vital to register classes for reflection and vice versa.

Changes are only anticipated for code that registers elements for reflection based on reachability information. In the Feature interface, that would be callbacks like registerReachabilityHandler or registerSubtypeReachabilityHandler.

zakkak commented 1 year ago

@christianwimmer this is a very powerful feature for building standalone images since it allows users to conditionally include extra code only when necessary. At the moment Quarkus uses this in 4 features, which is not a lot but still useful. We would thus suggest discouraging the use instead of deprecating it.

GraalVM could print a warning when such callbacks are being used in a layered build, and possibly treat the classes passed to these callbacks as reachable in layered builds (this however might not be desirable as it might result in different behavior between layered and standalone builds).

SergejIsbrecht commented 1 year ago

@christianwimmer ,

Changes are only anticipated for code that registers elements for reflection based on reachability information. In the Feature interface, that would be callbacks like registerReachabilityHandler or registerSubtypeReachabilityHandler.

Would this apply to layered images only, or would it also apply to standalone images?

xlight05 commented 10 months ago

This could be a game-changer for languages like Ballerina. huge +1 =)

NCLnclNCL commented 10 months ago

This could be a game-changer for languages like Ballerina. huge +1 =)

Good

ahachete commented 9 months ago

This is fantastic. Some time ago I was asking about this in GraalVM's Slack.

Having more than one GraalVM native on a system means potentially large amounts of "binary code" being duplicated, and has led to designs where two applications are made into the same "program", just "behaving differently" depending on how it is called, to avoid having to larger binaries.

This will be truly welcome. Thanks!

CharlieReitzel commented 8 months ago

We're currently evaluating GraalVM to reduce memory requirements and startup times. Our application is a family of a couple dozen REST services built using Spring Boot, plus a bunch of Jars shared among the services. Altogether, they add up to about 90MB of jars at runtime, including all 3rd party dependencies, plus another 110MB for the JRE.

The native image for our smallest service comes in over 60MB. Multiply by a few dozen executables, and we're talking about a noticeable bump in our container image size. We're expecting some increase, but don't want to spend a lot of time explaining why it grew soooooo much while we're justifying the change as a performance improvement.

Specifically, we're trying to reduce resource requirements (CPU and memory). Our 1st startup is, in fact, a CPU bottleneck for all those Spring Boot apps and we're betting Spring AOT processing can knock that down for us. I think our use case would benefit a great deal from "Layered Native Images", for example by packaging all 3rd party dependencies into a "base layer".

After looking at the options for writing shared libraries, I see they are oriented entirely for integration with C code. And that makes sense. Would it make sense to think about this proposal as defining a native linkage specification (ABI?) for Java classes and methods?

+1

VaginAY commented 4 months ago

I believe that the indivisibility (monolithicity) of the final image obtained when working with graalvm is one of its key disadvantages. Especially for large and complex applications.

I hope that this feature will not get lost in the background of other important improvements.

Thank you very much for your work!

oracle / graal