oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.36k stars 1.63k forks source link

FR: Add option to explicitly exclude packages/classes from native images #3225

Open zakkak opened 3 years ago

zakkak commented 3 years ago

Feature request

Add the option to explicitly exclude packages/classes from native images.

Is your feature request related to a problem? Please describe.

With the recent release of GraalVM 21.0 we came across https://github.com/quarkusio/quarkus/issues/14972. The culprit of this issue is that although an application may never use some reachable (based on the static analysis) code, the code still gets baked in the native image and in some cases also requires it to be linked against extra libraries.

What we observed is that when using JAX-B the native-image brings in a number of classes from javax.imageio, java.awt, sun.java2d, etc. even though the application may not actually use them. The cause seems to be the class initialization of RuntimeBuiltinLeafInfoImpl which essentially enables JAX-B to process images, something that an application may have no interest in.

Enabling users to exclude specific packages/classes that they know a priori that their application doesn't rely on can have the following benefits:

  1. Reduced image size
  2. Faster compilation times
  3. Less library dependencies

Describe the solution you'd like.

An option to explicitly exclude specific packages/classes from the native image, something like --exclude=java.awt.**.

The exclusion should ensure that:

  1. No code or instances of the classes/packages matching the exclusion pattern end up in the native image.
  2. Since the classes/packages are excluded they don't affect the dependencies of the native image (e.g. require linking against liblcms if sun.java2d.cmm.lcms is excluded
  3. In case the code reaches excluded parts at runtime a proper exception/error is thrown.

Describe who do you think will benefit the most. GraalVM users, and developers of libraries and frameworks which depend on GraalVM.

Describe alternatives you've considered. The current way to work around such issues is to substitute the methods and fields that cause the undesirable classes to be reached. Unfortunately though in the case of JAX-B this seems impossible (happy to be corrected) since the method that needs to be substituted is the class initializer itself, and even if there was a way to substitute it, it would require duplicating a vast amount of code.

Express whether you'd like to help contributing this feature I am willing to contribute and I would appreciate some guidance.

zakkak commented 3 years ago

A possible implementation would Substitute all reachable classes matching the pattern passed to --exclude and it would also Substitute all reachable methods of those classes to throw an exception.

For instance, the following program:

import javax.imageio.ImageIO;
import java.io.File;
import java.io.IOException;

public class AWT {
    public static void main(String[] args) {
        try {
            ImageIO.read(new File("myfile.jpeg"));
        } catch (IOException e) {
            //
        }
    }
} 

when compiled with --exclude=javax.imageio.ImageIO should generate the equivalent to the native image generated by compiling the following code:

import javax.imageio.ImageIO;
import java.io.File;
import java.io.IOException;
import java.awt.image.BufferedImage;

import com.oracle.svm.core.annotate.*;

public class AWT {
    public static void main(String[] args) {
        try {
            ImageIO.read(new File("myfile.jpeg"));
        } catch (IOException e) {
            //
        }
    }
}

@Substitute //
@TargetClass(className = "javax.imageio.ImageIO")
final class Target_javax_imageio_ImageIO {

    @Substitute
    public static BufferedImage read(File input) throws IOException {
        throw new UnsupportedOperationException("javax.imageio.ImageIO has been excluded from this native-image please recompile without excluding it");
    }
}
borkdude commented 3 years ago

Co-incidentally I'm also experiencing java.awt classes being pulled in while I don't expect those in my image, even more so when upgrading to JDK 17.

Screen Shot 2021-11-03 at 21 50 23
borkdude commented 3 years ago

@zakkak Adding -H:ServiceLoaderFeatureExcludeServices=java.net.ContentHandlerFactory gets rid of including java.awt.Toolkit in the image for me. Thanks to @olpaw.

ppalaga commented 1 year ago

It would be great if the proposed class/package exclusions could be effective during the reachability analysis. Unfortunately substitutions won't help with that.

I am currently adding native support for cxf-rt-features-metrics. It fails during the analysis due to the fact that org.apache.cxf.metrics.codahale.CodahaleMetricsProvider requires a class (com/codahale/metrics/jmx/ObjectNameFactory) from an optional dependency. Given that I do not want to support that use case at all, it would be great to be able to remove org.apache.cxf.metrics.codahale.CodahaleMetricsProvider before the analysis.

fniephaus commented 1 month ago

We've been discussing an alternative solution some time ago. Would be great if someone (@zakkak?) could confirm this would be sufficient:

The idea is to add a --remove-modules option, which is basically a convenience flag for using --limit-modules: Instead of having to specify all modules explicitly, the NI driver would essentially determine the list of all modules, remove the modules specified with --remove-modules, and pass the remaining modules via --limit-modules to the builder.

I was able to remove AWT by simply limiting the set of modules to all but java.desktop manually. Note that although removed, I guess you would end up with an error at run-time if the corresponding code path is taken.

ppalaga commented 1 month ago

@fniephaus modules would be too coarse grained for the situation I sketched above.

vjovanov commented 1 month ago

This proposal is certainly tempting, however, I have three concerns:

  1. If we make it so easy to cut code out, the community will never start thinking about image size. The correct solution here would be to add a build-time flag to JAX-B that excludes image IO and prints a proper error in that case. Did anyone try to open a ticket on these libraries to fix the root cause?
  2. If we cut out Image IO, then anyone using a framework who wants to do image IO cannot do that. We spread our opinion to the user which is not ideal.
  3. If we cut out code generically, we cannot provide a proper error message. This will degrade usability of the project as people will hit unexpected errors in the code with no proper documentation.

To address all of these concerns, the only solution I see is to make the process of removing code more precise, and as such slightly harder. We can require the user to say which calls should be removed from which class or method and for what reason.

This could be done via a non-API flag: I would refrain from making this an API flag until we see how it is used in practice.

Would this work for everybody? Do you see alternative solutions that will not affect user experience?

ppalaga commented 1 month ago

require the user to say which calls should be removed from which class or method and for what reason.

Thanks, that sounds like a good solution to me.

@vjovanov for the concerns you expressed above, I think the main goal is to give means to users who know what they do. Be it framework developers (such as Quarkus, where such removals can be made configurable and well documented) or users of plain GraalVM NI. JAX-B is rather hard to change, bc. it implements some standard. Putting a hard-coded exclusion of calls/classes into JAX-B (not sure you meant exactly that) would not work for everybody. We support the use case with images in SOAP calls in Quarkus CXF, that requires to have java.awt.Image available, so the exclusion of the image stuff would have to be controllable by end user (when she is sure her calls do not touch any images) or by quarkus-cxf based on some runtime introspection of SOAP services available in the application.

vjovanov commented 1 month ago

@vjovanov for the concerns you expressed above, I think the main goal is to give means to users who know what they do.

I agree, but unfortunately if we leave a simple mechanism it will inevitably leak towards end-users. Even the example give on this ticket (JAX-B) will almost in all cases be used by the end user that does not know how to fix this.

JAX-B is rather hard to change, bc. it implements some standard. Putting a hard-coded exclusion of calls/classes into

JAX-B (not sure you meant exactly that) would not work for everybody. If it is guarded with a flag, than it should work for everybody if the flag is not set. We can then set the flag to change the spec a bit and ignore Image IO. The effect is the same as with the proposal that we are discussing here: the spec will be changed for some users.

Inspired by your comment, I think we always need to allow the undo operation from the command line: If a user really needs image IO JAX-B, it is unrealistic to ask them go to the framework and modify it. I feel we need a way to disable any of these operations, and the clear description on how to do it should be written in the error message itself.

fniephaus commented 1 month ago

@ppalaga Quarkus uses our internal Substitution API all over the place. I don't see why they can't just cut out Image IO for the user.

Being able to cut out arbitrary parts in an application is a very sharp tool, and I'm not sure it's something we should allow end-users to do or control.

The jaxb issue is well-know, and it's unfortunate no one has stepped up to fix the root cause.

zakkak commented 4 weeks ago

We can require the user to say which calls should be removed from which class or method and for what reason.

Would that offer any new functionality over the existing @Substitute/@Delete annotations? I can only imagine it as a more flexible mechanism, i.e. one that doesn't require closely monitoring the code being substituted for upstream changes.

How do you imagine it working? Would we be able to just say remove println calls from foo method?

@ppalaga Quarkus uses our internal Substitution API all over the place. I don't see why they can't just cut out Image IO for the user.

The issue is that AFAIK we can't substitute static initializers, and in cases like https://github.com/eclipse-ee4j/jaxb-ri/blob/2.3.3-b01-RI-RELEASE/jaxb-ri/runtime/impl/src/main/java/com/sun/xml/bind/v2/model/impl/RuntimeBuiltinLeafInfoImpl.java#L355-L443 even if we did it would be impossible to maintain the substitution in the long run due to the size and complexity of the initializer. That was the motivation behind this feature request.

The correct solution here would be to add a build-time flag to JAX-B that excludes image IO and prints a proper error in that case. Did anyone try to open a ticket on these libraries to fix the root cause?

I agree, but we don't always have the resources to persuade such a change and even if we do we still need a "work-around" in the meantime or in case the upstream project is not willing to accept the change.

Skater901 commented 3 days ago

The idea is to add a --remove-modules option, which is basically a convenience flag for using --limit-modules: Instead of having to specify all modules explicitly, the NI driver would essentially determine the list of all modules, remove the modules specified with --remove-modules, and pass the remaining modules via --limit-modules to the builder.

This would work perfectly for me. I have a Dropwizard application that I'm compiling to a native binary, and I've just discovered that java.xml and java.desktop are being included in the native image. java.xml makes sense, but I don't need it since I'm only processing JSON. But java.desktop makes no sense; why does my web server need GUI code??

Being able to just exclude those two modules would be a really nice feature. Or, alternatively, if it's possible to figure out which dependencies are using those modules, I could exclude/submit PRs to the dependencies to get rid of those modules. (I don't know of any way to check what Java modules a library is using, so if anybody knows how I can do that, please let me know!)