ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Unable to build GATE interface #149

Closed ethanhkim closed 1 month ago

ethanhkim commented 3 months ago

Hello,

I am trying to build the GATE interface through crate_nlp_build_gate_java_interface and I am consistently running into this error message:

/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/CrateGatePipeline.java:245: error: constructor ConsoleAppender in class ConsoleAppender cannot be applied to given types;
        ConsoleAppender log_appender = new ConsoleAppender(log_layout, "System.err");
                                       ^
  required: no arguments
  found: PatternLayout,String
  reason: actual and formal argument lists differ in length
/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/CrateGatePipeline.java:247: error: incompatible types: ConsoleAppender cannot be converted to Appender
        rootlog.addAppender(log_appender);
                            ^
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
2 errors
Traceback (most recent call last):
  File "/path/to/venv/crate/bin/crate_nlp_build_gate_java_interface", line 8, in <module>
    sys.exit(main())
  File "/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/build_gate_java_interface.py", line 153, in main
    subprocess.check_call(cmdargs)
  File "/usr/lib64/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['javac', '-Xlint:unchecked', '-classpath', '/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/compiled_nlp_classes:/usr/local/GATE_Developer_9.0/bin/gate.jar:/usr/local/GATE_Developer_9.0/lib/*', '-d', '/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/compiled_nlp_classes', '/path/to/venv/crate/lib64/python3.9/site-packages/crate_anon/nlp_manager/CrateGatePipeline.java']'

Currently running Python 3.9.18, crate-anon 0.20.3, Java 1.8.0.402, GATE v9.0 on AlmaLinux Release 9.3. Any help would be greatly appreciated!

RudolfCardinal commented 3 months ago

I think it's a change to log4j since this was originally written (e.g. similar to https://stackoverflow.com/questions/66086690). We'll get on it. Apologies as there is likely to be a short delay.

RudolfCardinal commented 3 months ago

We should also double-check that Java compilation is checked in the workflow (inc. for Docker), though I thought it was already. Perhaps we have a Java version pinned before this change.

ethanhkim commented 3 months ago

Thanks for the quick response! Really appreciate it.

martinburchell commented 3 months ago

Our Docker image (based on python:3.8-slim-buster ie Debian 10) uses GATE 8.6.1. GATE 8.6.1 ships with log4j-1.2.17.jar. The Dockerfile runs crate_nlp_build_gate_java_interface and we are testing this in at least one of our workflows.

GATE 9.0.1 appears to be the latest version (albeit from March 2021) and looking at the bundled libraries, log4j has been replaced with log4j-over-slf4j. My hunch is that with log4j gone from the GATE lib directory, the build script is trying to use whatever version it can find on the system, which for @ethanhkim is a 2.x version with the API change.

Short term fix is to use GATE 8.6.1 or CRATE running under Docker https://crateanon.readthedocs.io/en/latest/installation/docker.html.

In the longer term, we shouldn't be using log4j 1.x as it isn't supported any more.

We could drop support for all but the latest release of GATE (9.0.1). This version has 46 vulnerabilities compare to the 115 vulnerabilities in 8.6.1.

We could make CrateGatePipeline.java work with both versions of GATE by making sure the correct version of whatever dependencies we use are pulled in (possibly with Maven or similar).

ethanhkim commented 3 months ago

Thanks for the detailed response! I'll give installing GATE 8.6.1 a go and see if the issue resolves in the short term.

RudolfCardinal commented 3 months ago

I guess it'd be good to separate whatever GATE wants from what we want, e.g. by pinning a log4j (or similar module) version. Our code imports modules from core Java, log4j, and GATE. I'm not sure whether this can all co-exist happily, if e.g. we specified a URL to the class loader (old example at https://stackoverflow.com/questions/6105124/); this (https://boyl.es/post/two-versions-same-library/) suggests Java doesn't support loading two versions of one class (e.g. if we loaded one logger but GATE wants another), but can Maven get round this problem (same link)?

martinburchell commented 2 months ago

I think we can fix this by moving the log4j configuration to a file. This should remove the incompatible code. With GATE 9.x it appears that the calls to log4j will be routed through sl4fj to Logback so if we provide an equivalent Logback configuration, it should all work. I'm trying this out on the later-gate-dev branch.