kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.44k stars 443 forks source link

Setting JEP native library path for java >10 #688

Open kermitt2 opened 3 years ago

kermitt2 commented 3 years ago

I am opening a specific issue for this, although it was mentioned #603.

The JEP instance will load itself the native JEP library, but for this to happen we need to add the path to this native library in the java.library.path. So far it was done dynamically, via the user usr_paths field where we appended the JEP native path, and only if some Deep Learning model was required by the config. Unfortunately usr_paths could not be accessed any more as Field via the ClassLoader from Java 11 for security reason.

A possible solution is to use MethodHandler as explained here but it seems not working then for java <9 and stops working again for java 15 because usr_paths cannot be accessed via the ClassLoader any more at all (Caused by: java.lang.NoSuchFieldError: usr_paths).

So adding new paths for native libraries at runtime in Java seems a dead end.

The alternative would be to add the JEP native path when launching the JDK via Gradle, with something like this:

    systemProperty "java.library.path", file("grobid-home/lib/**os-arch***/").absolutePath

If JEP is not used, the library won't be loaded and this path won't be used anyway, so it's not a problem to add the path every time at launch.

For this, we need to identify the right OS info from Gradle (see maybe here for an example), but also normally the version of the JDK and the version of python installed locally...

Note: It does not affect the docker image which will always work fine. In the case of the Docker image, the right JEP is in the system library path because it is fully installed at system-level on the image, and we are not using the embedded native JEP library at all.

kermitt2 commented 2 years ago

Limiting the settings to linux 64, so far I didn't manage to get anything working here via Gradle.

It's possible to add the system property to tasks like this:

task(jatsEval, dependsOn: 'classes', type: JavaExec, group: 'modelevaluation') {
        main = 'org.grobid.trainer.evaluation.EndToEndEvaluation'
        classpath = sourceSets.main.runtimeClasspath
        args 'nlm', getArg('p2t', '.'), getArg('run', '0'), getArg('fileRatio', '1.0')
        jvmArgs '-Xmx3072m'
        systemProperty "java.library.path", file("../grobid-home/lib/lin-64/jep").absolutePath 
    }

But it overrides the normal java.library.path and fails then to make JEP working as the other library paths are discarded. I didn't find how to extend the path.

The following also fails for the same reason:

jvmArgs = ['-Xmx3072m', '-Djava.library.path=:./grobid-home/lib/lin-64/jep']
lfoppiano commented 2 years ago

For the linux 64 settings did you modify anything in the code? Is there a branch perhaps?

About the gradle stuff, I could not test it on my macOs, but how about something like this?

systemProperty "java.library.path","${System.getProperty('java.library.path')}:${file("../grobid-home/lib/lin-64/jep").absolutePath}"
kermitt2 commented 2 years ago

The change would be in gradle to pass an extended java.library.path at launch - instead of modifying it dynamically, so to replace the System.loadLibrary(DELFT_NATIVE_LIB_NAME); in LibraryLoader.java. I am pretty sure the case is exactly the same for linux and MacOs (same security constraint), so the solution via Gradle would be the same.

kermitt2 commented 2 years ago

I will test it thanks !

lfoppiano commented 2 years ago

OK, I'm not sure how helpful this is, but I managed to get it work with JDK 14 on macOs using the command line stuff (not gradle).

After several attempts in settings the java.library.path, setting via -D seems a fake news. I think the only way seems via the env variables (which name depends on the OS 😭):

    On Windows: Add the path to the library to the PATH environment variable.
    On Linux: Add the path to the library to the LD_LIBRARY_PATH environment variable.
    On Mac: Add the path to the library to the DYLD_LIBRARY_PATH environment variable.

I'm using a conda environment called jep and the variable looks like something like:

DYLD_LIBRARY_PATH=.:./grobid-home/lib/mac-64/lib:/Users/lfoppiano/opt/anaconda3/envs/jep/lib/python3.8/site-packages/jep:/Users/lfoppiano/opt/anaconda3/envs/jep/lib

Said that, I have no idea how to make it usable

lfoppiano commented 2 years ago

Today I stumbled upon a thread on stackoverflow where one of the comments seems an updated solution for JDK > 12. I haven't tried, though.

kermitt2 commented 2 years ago

Thanks for the experiments! So you observed the same issue with Gradle. I think the goal is to have it working with Gradle, and/or dynamically. The solution with the LD_LIBRARY_PATH is what I am trying to avoid because we would need a launch script or something equivalent and it would be very painful for third party usage - which would be super disappointing when using a JVM.

I got the java.library.path setting working with gradle (with the excerpt above) but it is overwriting the library path, I could not extend it (so the jep library was available, but all the rest was not so it was then failing...). But maybe it won't work with every JVM versions and platform? It looks to me as a Gradle problem so I will probably ask for some help on the Gradle forum.

Today I stumbled upon a thread on stackoverflow where one of the comments seems an updated solution for JDK > 12. I haven't tried, though.

It could work! It requires even more parameter settings in Gradle. It involves both dynamical loading and settings at launch. Then we need to try older JVM versions compatibility?

If we could find a solution equivalent to or via the -D command line property maybe, I think we could expect something more stable for existing and future JVM - but maybe it's simply not usable.

lfoppiano commented 2 years ago

I got the java.library.path setting working with gradle (with the excerpt above) but it is overwriting the library path, I could not extend it (so the jep library was available, but all the rest was not so it was then failing...). But maybe it won't work with every JVM versions and platform? It looks to me as a Gradle problem so I will probably ask for some help on the Gradle forum.

Did you try the example I posted before that should extend the list of paths? it does not work?

kermitt2 commented 2 years ago

Did you try the https://github.com/kermitt2/grobid/issues/688#issuecomment-1140774115 that should extend the list of paths? it does not work?

Sorry, I've tried and it works ! I just had to include also the path to grobid-home/lib/lin-64 for other libraries.

systemProperty "java.library.path","${System.getProperty('java.library.path')}:${file("../grobid-home/lib/lin-64/jep").absolutePath}:${file("../grobid-home/lib/lin-64").absolutePath}"

🙌 ✨.

This is great, now we normally only need to make it OS specific and apply it to the run task.

kermitt2 commented 2 years ago

I started a PR for this https://github.com/kermitt2/grobid/pull/921