This repository will not be maintained in the future, it will remain as it is. As a replacement, please use the new version of the e-magyar toolchain: https://github.com/dlt-rilmta/emtsv. This new version does not support GATE directly, but has efficient inter-module communication via simple tsv providing a much more convenient command-line support, and also a REST API.
Sources for the Lang_Hungarian GATE plugin containing Hungarian processing resources (wrappers around already existing Hungarian NLP tools) developed by the Department of Language Technology at RIL-MTA.
Developers: Péter Kundráth, Márton Miháltz, Bálint Sass, Mátyás Gerőcs
The plugin contains the following GATE Processing Resources.
Firstly, the Lang_Hungarian plugin contains the e-magyar toolchain:
Some older tools are also integrated:
XXX You will also find the following ready made applications in GATE Developer (to access, in the menu click File -> Ready Made Applications -> Hungarian, or right-click Applications in the GATE Resources tree):
XXX Please see this Wiki page for more information on what tools are expected to be integrated and their statuses.
Requirements:
64-bit operating system
16GB RAM (8GB maybe enough)
64-bit Java runtime (JRE or JDK) version 1.8 or later
GATE Developer 8.0 or 8.1 or 8.2. Note: do not use GATE 8.4. It is buggy: see #24. Get version 8.2 instead. Other GATE versions are not tested.
When launching GATE Developer request for 4GB of heap space.
On Linux or OS X, please use the following command:
<your_gate_installation_directory>/bin/gate.sh -Xmx4g -Xms2g
On windows, please set the _JAVA_OPTIONS
environment variable to -Xmx4g -Xms2g
, restart the computer, and then launch GATE Developer.
This is the default recommended install method for users. Only GATE Developer and internet access are required.
Follow these steps to install the plugin directly into GATE Developer using the ready-made online GATE plugin repository hosted at corpus.nytud.hu
(Note: the whole plugin complete with model files requires 1GB of space and may take a couple of minutes to download):
/home/username/My_GATE_plugins/
).RIL-MTA
http://corpus.nytud.hu/GATE/gate-update-site.xml
sh xperm.sh
command in Lang_Hungarian
directory under your GATE User Plugin Directory to add necessary execute permissions.Lang_Hungarian/resources/huntag3/setup_linux.sh
(with superuser privileges). On Windows see Lang_Hungarian/resources/huntag3/setup_windows.cmd
.This method gives more control over the installation process, it uses a clone of this github repository.
complete.sh
(on Linux) or obtain these resources one by one:
Lang_Hungarian/resources/huntag3/setup_linux.sh
(on Ubuntu or Debian Linux) to install required dependencies for HunTag3 (superuser privileges required).Lang_Hungarian
into your GATE user plugin directory (see Plugin command-line installation).Lang_Hungarian
: directory tree for the Lang_Hungarian GATE plugin
src
: Java sources of the included Processing Resources. See Javadocs for details.resources
: non-Java binaries, sources and resources files for the included toolshungarian.jar
: plugin Java binaries in a jar filebuild.xml
: use this to build the jar from sources using Apache Antcreole.xml
: this tells GATE how to use hungarian.jar as a CREOLE plugin.classpath
, .project
: use these to import project into Eclipse Java IDEMakefile
: use to rebuild, install etc. the plugin from command lineTo build the GATE plugin from the Java sources
(and add the neccessary metadata) run make build
.
A working GATE installation is necessary.
The GATE installation directory should be given to make
as GATE_HOME
:
make build GATE_HOME=/your/gate/installation/directory
This will create hungarian.jar
in the directory Lang_Hungarian
.
(A precompiled hungarian.jar
is also accessible directly from the repository.)
If you have rebuilt the plugin, it is also possible to install it to your GATE user plugin directory with the following command:
make local_install GATE_USER_PLUGINS_DIR=/your/gate/user/plugin/directory
This will copy the whole directory tree under Lang_Hungarian/
from this repository to your GATE user plugin directory. Alternatively, you can also make a symbolic link using the following command:
make link_devdir GATE_USER_PLUGINS_DIR=/your/gate/user/plugin/directory
To update the GATE plugin repository hosted at http://corpus.nytud.hu/GATE
,
first be sure that you have a fully functional plugin (see Method 2),
and then run make upload
specifying your user name on corpus.nytud.hu
:
make upload CORPUSUSER=yourusername
This will upload your local hungarian.jar
, creole.xml
and resources
directory to the update server.
This enables users to use Method 1 for installation.
The Lang_Hungarian GATE Processing Resources can be run not just from the GATE GUI (called GATE Developer) but from Linux command line using GATE Embedded technology.
The recommended method is to use the so called gate-server which is an optimized solution for running GATE Processing Resources.
The secondary option to use the Lang_Hungarian GATE Processing Resources from Linux command line is the simple method described here.
This functionality which is implemented in Pipeline.java
means
that any combination of PRs in the Lang_Hungarian plugin can be run
with arbitrary parameter settings.
Just type:
make GATE_HOME=/your/gate/installation/directory pipeline
By default texts/peldak.txt
is used as input file,
but it can be changed using the PIPELINE_INPUT
parameter
to e.g. the XML version of the default input file:
make GATE_HOME=/your/gate/installation/directory PIPELINE_INPUT=texts/peldak.xml pipeline
The PRs to be run should be specified in a config file. Lines of this config file should contain either only the name of a PR:
hu.nytud.gate.parsers.MagyarlancDependencyParser
... or the name of PR together with some parameters for this PR given as
parameterName parameterValue
in the following format:
hu.nytud.gate.parsers.MagyarlancDependencyParser addPosTags true addMorphFeatures true
The default config file is
Lang_Hungarian/resources/pipeline/pipeline.config
which runs the full Lang_Hungarian
plugin
and can be overridden using the CONFIG
parameter:
make GATE_HOME=/your/gate/installation/directory CONFIG=/path/to/config/file pipeline
There are some ready-made config files in the
Lang_Hungarian/resources/pipeline
directory
for some usage scenarios.
For converting GATE-XML coming from e-magyar to TSV use emconv.py
.