oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.21k stars 1.62k forks source link

[native-image] misleading error message in native image on macOS when having many (jar)files open at the same time #1804

Closed borkdude closed 7 months ago

borkdude commented 4 years ago

Updated issue using pure Java (after discovering the root cause)

When opening a lot of JarFiles without closing them, a seemingly unrelated exception is thrown in a native binary on macOS.

Repro using pure Java. This will open every jar file present in your ~/.m2 folder.

LockRepro.java:

import java.io.File;
import java.util.ArrayList;
import java.util.jar.JarFile;

public class LockRepro {

    private static ArrayList<JarFile> jarFiles = new ArrayList<JarFile>();

    private static void listFiles(File folder) {

        File[] files = folder.listFiles();

        for (File file : files) {
            if (file.isFile() && file.getPath().endsWith(".jar")) {
                System.out.println(file.getName());
                try {
                    JarFile j = new JarFile(file);
                    jarFiles.add(j);
                }
                catch (Exception e) {
                    e.printStackTrace();
                }
            }
            else if (file.isDirectory()) {
                listFiles(file);
            }
        }
    }

    public static void main(String[] args) {

        File m2Repo = new File(System.getProperty("user.home"), ".m2");
        listFiles(m2Repo);
        System.out.println("Amount of jarFiles created: " + jarFiles.size());
    }
}

Compile with javac LockRepro.java followed by native-image LockRepro. Then run with ./lockrepro. What I'm seeing on my Mac after printing some 253 jar file names:

Exception in thread "main" java.lang.NullPointerException
    at LockRepro.listFiles(LockRepro.java:10)
    at LockRepro.listFiles(LockRepro.java:24)
    at LockRepro.listFiles(LockRepro.java:24)
    at LockRepro.listFiles(LockRepro.java:24)
    at LockRepro.listFiles(LockRepro.java:24)
    at LockRepro.listFiles(LockRepro.java:24)
    at LockRepro.main(LockRepro.java:32)

Note that java LockRepro still works and doesn't throw this exception. It prints Amount of jarFiles created: 1661 at the end on my machine.

I would have expected the message in the exception using the native to be something like 'too many files open'.

Original issue using Clojure

Clj-kondo, a linter for Clojure, is built as a native binary using GraalVM.

The binary shows a problem with RandomAccessFile when it lints a lot of sources, while the Java version doesn't have this problem. Problem is reproducible using GraalVM 19.2.1 on macOS 10.14.6 / macOS 10.15 (19A602) / 10.15.1 (19B88). Linux users do not seem to be able to reproduce it.

The problem is that java.io.FileNotFoundException is thrown for no apparent reason.

UPDATE: meanwhile I've been able to find the cause for this problem. It happens when you have many JarFile objects open at a single time and you don't close them.

Instructions to trigger the problem:

Build the following .jar file to a native (unzip first):

clj-kondo-2019.10.26-standalone.jar.zip

I do this using the following reflection.json file contains this:

[
    {
        "name": "java.lang.Class",
        "allDeclaredConstructors": true,
        "allPublicConstructors": true,
        "allDeclaredMethods": true,
        "allPublicMethods": true
    }
]

and compile with:

native-image \
  -jar clj-kondo-2019.10.26-standalone.jar \
  -H:Name=clj-kondo \
  -H:+ReportExceptionStackTraces \
  -J-Dclojure.spec.skip-macros=true \
  -J-Dclojure.compiler.direct-linking=true \
  "-H:IncludeResources=clj_kondo/impl/cache/built_in/.*" \
  -H:ReflectionConfigurationFiles=reflection.json \
  --initialize-at-build-time  \
  -H:Log=registerResource: \
  --verbose \
  --no-fallback \
  --no-server \
  "-J-Xmx3g"

(A complete overview of how the uberjar is built and from that, the native image, can be viewed here: https://github.com/borkdude/clj-kondo/blob/9de6b455a8ccd6df7923cb480b621c42572908a1/script/compile#L1)

Before we execute clj-kondo we will populate our local maven repo with a bunch of Clojure libraries that we will lint. We need leiningen to do this and we will use this project.clj file (unzip first):

project.clj.zip

mkdir -p /tmp/project
cp clj-kondo /tmp/project
cp project.clj /tmp/project
cd /tmp/project
lein classpath # this will trigger a download of the dependencies, if it doesn't download them all at once, you might need to run it more than once

Now execute the binary clj-kondo as follows:

mkdir -p /tmp/project/.clj-kondo # a RandomAccesFile will be created in this directory
cd /tmp/project
./clj-kondo --lint "$(lein classpath)" > /dev/null

The output of clj-kondo is not important so we write it to /dev/null. The exit code can be ignored as well. The most important is that it will throw an exception that we do not see in the JVM version:

$ clj-kondo --lint "$(lein classpath)"  > /dev/null
java.io.FileNotFoundException: /private/tmp/project/.clj-kondo/.cache/2019.10.26/lock
    at com.oracle.svm.core.posix.PosixUtils.fileOpen(PosixUtils.java:235)
    at java.io.RandomAccessFile.open(RandomAccessFile.java:709)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
    at clj_kondo.impl.cache$sync_cache.invokeStatic(cache.clj:123)
    at clj_kondo.core$run_BANG_.invokeStatic(core.clj:97)
    at clj_kondo.main$main.invokeStatic(main.clj:85)
    at clj_kondo.main$main.doInvoke(main.clj:78)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clojure.core$apply.invokeStatic(core.clj:665)
    at clj_kondo.main$_main$fn__4965.invoke(main.clj:104)
    at clj_kondo.main$_main.invokeStatic(main.clj:104)
    at clj_kondo.main$_main.doInvoke(main.clj:102)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clj_kondo.main.main(Unknown Source)

Compare this to linting with the standalone jar (clj-kondo-2019.10.26-standalone.jar.zip):

$ java -jar clj-kondo-2019.10.26-standalone.jar --lint "$(lein classpath)"  > /dev/null

That works without an exception.

Note that the error in the native only exhibits when the linted classpath is very large, not with a smaller classpath.

The error was reported in this issue in the clj-kondo repo.

borkdude commented 4 years ago

Maybe related: https://github.com/oracle/graal/issues/1581

fernando-valdez commented 7 months ago

Closing this issue as it has been fixed in past releases. If you can see that the problem persists in the latest snapshot, please feel free to reopen this ticket.