ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Fix the GATE Interface to work with both GATE 8.6.1 and GATE 9.0.1 #150

Closed martinburchell closed 1 month ago

martinburchell commented 2 months ago

Fixes #149

Unlike GATE 8.6.1, GATE 9.0.1 no longer ships with log4j. Instead there is Logback and a compatibility layer log4j-over-slf4j, which isn't a complete replacement for log4j 1.x, hence the errors reported in #149. So I've moved the default logger configuration to XML files (log4j.xml for 8.6.1 and logback.xml for 9.0.1). It's possible to provide alternative configurations provided they are on the classpath for CrateGatePipeline.java.

The compatibility layer does not support setting the logging levels programmatically and they are silently ignored. So these are now set in the XML configuration as well.

There's also a new script to generate the GATE auto install XML script and a GitHub action to test the interface works with both GATE versions.

The default version of GATE built in the Docker image is now 9.0.1 and is configurable.

I've tested all of the GATE demo scripts apart from KCL KConnect, which reported lots of missing files. Possibly user error.

RudolfCardinal commented 2 months ago

Looks good; thank you! I am struggling slightly to test locally.

... and I get:

=> => naming to docker.io/library/crate:0.20.3                                                                                                         0.0s 
Creating /home/rudolf/tmp/crate_docker_tmp/config/crateweb_local_settings.py                                                                                 
Editing /home/rudolf/tmp/crate_docker_tmp/config/crateweb_local_settings.py                                                                                  
Creating /home/rudolf/tmp/crate_docker_tmp/config/crate_anon_config.ini
Editing /home/rudolf/tmp/crate_docker_tmp/config/crate_anon_config.ini
Traceback (most recent call last):
  File "/home/rudolf/Documents/code/crate/installer/./installer.py", line 1909, in <module>
    main()
  File "/home/rudolf/Documents/code/crate/installer/./installer.py", line 1887, in main
    installer.install()
  File "/home/rudolf/Documents/code/crate/installer/./installer.py", line 468, in install
    self.create_or_update_crate_database()
  File "/home/rudolf/Documents/code/crate/installer/./installer.py", line 1223, in create_or_update_crate_database
    self.run_crate_command("crate_django_manage migrate")
  File "/home/rudolf/Documents/code/crate/installer/./installer.py", line 543, in run_crate_command
    return self.docker.compose.run(
  File "/home/rudolf/dev/venvs/crate/lib/python3.10/site-packages/python_on_whales/components/compose/cli_wrapper.py", line 739, in run
    result = run(full_cmd, tty=tty)
  File "/home/rudolf/dev/venvs/crate/lib/python3.10/site-packages/python_on_whales/utils.py", line 194, in run
    raise DockerException(
python_on_whales.exceptions.DockerException: The docker command executed was `/usr/bin/docker compose --file docker-compose.yaml --file docker-compose-crate-db.yaml --file docker-compose-research-db.yaml --file docker-compose-secret-db.yaml --file docker-compose-source-db.yaml run --no-TTY --rm crate_workers crate_django_manage migrate`.
It returned with code 1
The content of stdout is 'Loading local settings from: /crate/cfg/crateweb_local_settings.py
'
The content of stderr is ' Container crate_crate_db  Running
 Container crate_source_db  Running
 Container crate_rabbitmq  Running
 Container crate_research_db  Running
 Container crate_secret_db  Running
wait-for-it: waiting for rabbitmq:5672 without a timeout
wait-for-it: rabbitmq:5672 is available after 0 seconds
wait-for-it: waiting for crate_db:3306 without a timeout
wait-for-it: crate_db:3306 is available after 0 seconds
wait-for-it: waiting for research_db:3306 without a timeout
wait-for-it: research_db:3306 is available after 0 seconds
wait-for-it: waiting for secret_db:3306 without a timeout
wait-for-it: secret_db:3306 is available after 0 seconds
wait-for-it: waiting for source_db:3306 without a timeout
wait-for-it: source_db:3306 is available after 0 seconds
Traceback (most recent call last):
  File "/crate/venv/bin/crate_django_manage", line 5, in <module>
    from crate_anon.crateweb.manage import main
  File "/crate/venv/lib/python3.8/site-packages/crate_anon/crateweb/manage.py", line 90, in <module>
    django.setup()
  File "/crate/venv/lib/python3.8/site-packages/django/__init__.py", line 19, in setup
    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
  File "/crate/venv/lib/python3.8/site-packages/django/conf/__init__.py", line 82, in __getattr__
    self._setup(name)
  File "/crate/venv/lib/python3.8/site-packages/django/conf/__init__.py", line 69, in _setup
    self._wrapped = Settings(settings_module)
  File "/crate/venv/lib/python3.8/site-packages/django/conf/__init__.py", line 170, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/crate/venv/lib/python3.8/site-packages/crate_anon/crateweb/config/settings.py", line 571, in <module>
    _local_module = _loader.load_module()
  File "/crate/cfg/crateweb_local_settings.py", line 59
    CRATE_HTTPS = @@crate_https@@  # True: require HTTPS and disallow plain HTTP
                  ^
SyntaxError: invalid syntax
'

This is probably just me; if building properly via the test scripts then that's the main thing! But is there some sort of substitution failure? That's two versions of an error where the "@@..." stuff seems to have come through unmodified (@@crate_https@@ from print_crateweb_demo_config.py?), but possibly Docker is just mounting some area with old/duff config files of mine, despite attempts to avoid that.

martinburchell commented 2 months ago

Looks good; thank you! I am struggling slightly to test locally.

* Java compilation works fine.

* Attempted with current GATE (8.6.1) via `crate_run_gate_annie_demo` but this gives `Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final java.util.Comparator java.util.TreeMap.comparator accessible: module java.base does not "opens java.util" to unnamed module...`. So onto Docker;

This sounds like a Java version incompatibility. Try:

sudo update-alternatives --config java
sudo update-alternatives --config javac

Both GATE versions claim to support Java 8 or above

This is probably just me; if building properly via the test scripts then that's the main thing! But is there some sort of substitution failure? That's two versions of an error where the "@@..." stuff seems to have come through unmodified (@@crate_https@@ from print_crateweb_demo_config.py?), but possibly Docker is just mounting some area with old/duff config files of mine, despite attempts to avoid that.

The installer won't try to create crateweb_local_settings.py if it already exists so my hunch is that it was already present in your config directory with placeholders from a previous failed run. I may of course be completely wrong!

RudolfCardinal commented 2 months ago

Thanks; yes, you were exactly right about the leftover config. It installs fine. Question en passant: scripts like enter_crate_container.sh, stop_crate.sh, etc. call installer.py, which (via Installer.docker() and Installer.should_create_crate_db_container()) insists that CRATE_INSTALLER_CREATE_CRATE_DB_CONTAINER should be set to 0 or 1 (and will probably insist on another one next). Reasonable to allow these to be unset for the start/stop/enter/run_crate_command scripts?

martinburchell commented 2 months ago

Question en passant: scripts like enter_crate_container.sh, stop_crate.sh, etc. call installer.py, which (via Installer.docker() and Installer.should_create_crate_db_container()) insists that CRATE_INSTALLER_CREATE_CRATE_DB_CONTAINER should be set to 0 or 1 (and will probably insist on another one next). Reasonable to allow these to be unset for the start/stop/enter/run_crate_command scripts?

The script needs to work out which docker-compose-xxx files to include when running commands, which is:

This ends up as a call to docker compose that at its longest would begin with: $ docker compose -f docker-compose.yaml -f docker-compose-crate-db.yaml -f docker-compose-research-db.yaml -f docker-compose-secret-db.yaml -f docker-compose-source-db.yaml

So if we allowed the environment variables to be unset, we'd need to assume some defaults.

The installer should dump all of the CRATE_DOCKER and CRATE_INSTALLER environment variables into a file that can be sourced for subsequent runs. I normally just do that before executing any more installer commands.

RudolfCardinal commented 2 months ago

Understood, and thanks! Suggestion pushed -- an attempt to automatically restore these, if found. (Also a shell script for exec_crate_command.sh, versus run_crate_command.sh, and a bugfix to the latter.)