CATENA is a sieve-based system to perform temporal and causal relation extraction and classification from English texts, exploiting the interaction between the temporal and the causal model. The system requires pre-annotated text with EVENT and TIMEX3 tags according to the TimeML annotation standard, as these annotation are used as features to extract the relations.
CATENA is now available on Maven Central. Please add the following dependency in your pom.xml
.
<dependency>
<groupId>com.github.paramitamirza</groupId>
<artifactId>CATENA</artifactId>
<version>1.0.3</version>
</dependency>
To build the fat (executable) JAR:
mvn install:install-file -Dfile=./lib/ws4j-1.0.1.jar -DgroupId=edu.cmu.lti -DartifactId=ws4j -Dversion=1.0.1 -Dpackaging=jar
mvn package
to build the executable JAR file (in target/CATENA-<version>.jar
).resource/
. This folder must be placed within the root folder of the project.models/
, including: catena-event-timex.model
, catena-event-dct.model
, catena-event-event.model
and catena-causal-event-event.model
.! The input file(s) must be in the TimeML annotation format or CoNLL column format (one token per line) !
usage: Catena
-i,--input <arg> Input TimeML file/directory path
-f,--col (optional) Input files are in column format (.col)
-tl,--tlinks <arg> (optional) Input file containing list of gold temporal links
-cl,--clinks <arg> (optional) Input file containing list of gold causal links
-gl,--gold (optional) Gold candidate pairs to be classified are given
-y,--clinktype (optional) Output the type of CLINK (ENABLE, PREVENT, etc.) from the rule-based sieve
-x,--textpro <arg> TextPro directory path
-l,--matelemma <arg> Mate tools' lemmatizer model path
-g,--matetagger <arg> Mate tools' PoS tagger model path
-p,--mateparser <arg> Mate tools' parser model path
-t,--ettemporal <arg> CATENA model path for E-T temporal classifier
-d,--edtemporal <arg> CATENA model path for E-D temporal classifier
-e,--eetemporal <arg> CATENA model path for E-E temporal classifier
-c,--eecausal <arg> CATENA model path for E-E causal classifier
-b,--train (optional) Train the models
-m,--tempcorpus <arg> (optional) Directory path (containing .tml or .col files) for training temporal classifiers
-u,--causcorpus <arg> (optional) Directory path (containing .tml or .col files) for training causal classifier
For example
java -Xmx2G -jar ./target/CATENA-1.0.2.jar -i ./data/example_COL/ --col --tlinks ./data/TempEval3.TLINK.txt --clinks ./data/Causal-TimeBank.CLINK.txt -l ./models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model -g ./models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model -p ./models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model -x ./tools/TextPro2.0/ -d ./models/catena-event-dct.model -t ./models/catena-event-timex.model -e ./models/catena-event-event.model -c ./models/catena-causal-event-event.model -b -m ./data/Catena-train_COL/ -u ./data/Causal-TimeBank_COL/
The input document must be in tab-separated 'one-token-per-line' format, with each column as:
| token
| token-id
| sentence-id
| lemma
| event-id
| event-class
| event-tense+aspect+polarity
| timex-id
| timex-type
| timex-value
| signal-id
| causal-signal-id
| pos-tag
| chunk
| lemma
| pos-tag
| dependencies
| main-verb
|
event-id
and event-class
: TimeML event ID and attributestimex-id
and timex-type
and timex-value
: TimeML timex ID and attributessignal-id
and causal-signal-id
: temporal and causal signal IDevent-tense+aspect+polarity
: optional attributes of an event, if given O
, CATENA will infer them automatically according to PoS tags and dependency relationspos-tag
: BNC tagset (default tagset uset to build the models) or Penn Treebank tagsetchunk
: dependencies
: in the format of dep1:deprel1||dep2:deprel2||...
, dependency relations are resulted from Mate-toolsSee for example data/example_COL/
.
The output will be a list of temporal and/or causal relations, one relation per line, in the format of:
filename entity_1 entity_2 TLINK_type/CLINK/CLINK-R
TLINK_type
: One of TLINK types according to TimeML, e.g., BEFORE
, AFTER
, SIMULTANEOUS
CLINK
: entity_1 CAUSE
entity_2CLINK-R
: entity_1 IS_CAUSED_BY
entity_2CATENA contains two main modules:
The two modules interact, based on the assumption that the notion of causality is tightly connected with the temporal dimension: (i) TLINK labels for event-event pairs, resulting from the rule-based sieve + temporal reasoner, are used for the CLINK classifier, and (ii) CLINK labels are used as a post-editing method for correcting the wrongly labelled event pairs by the Temporal module.
Paramita Mirza and Sara Tonelli. 2016. CATENA: CAusal and TEmporal relation extraction from NAtural language texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, December. [pdf]
Causal-TempEval3-eval.txt
(available in data/
) is used in one of the evaluation schemes for causal relation extraction.! Whenever making reference to this resource please cite the paper in the Publication section. !
Soon!
For more information please contact Paramita Mirza (paramita135@gmail.com).