This repository contains code and models for replicating results from the following publication:
Part of the codebase is extended from e2e-coref.
./scripts/fetch_required_data.sh
./scripts/build_custom_kernels.sh
(Please make adjustments to the script according to your OS/gcc version)./scripts/fetch_all_models.sh
data/sample.jsonlines
for input format (json). Each json object can contain multiple sentences. python decoder.py conll2012_final data/sample.jsonlines sample.out
to predict SRL structures.[["John", "told", "Pat", "to", "stop", "the", "robot", "immediately", "."], ["Pat", "refused", "."]]
The following json object
"predicted_srl": [[1, 0, 0, "ARG0"], [1, 2, 2, "ARG2"], [1, 3, 7, "ARG1"], [4, 2, 2, "ARG0"], [4, 5, 6, "ARG1"], [4, 7, 7, "ARGM-TMP"], [10, 9, 9, "ARG0"]]
contains SRL predictions for the two sentences, formatted as [predicate_position, argument_span_start, argument_end, role_label]
. The token ids are counted starting 0 from the beginning of the document (instead of the beginning of each sentence).
For replicating results on CoNLL-2005 and CoNLL-2012 datasets, please follow the steps below.
The data is provided by:
CoNLL-2005 Shared Task,
but the original words are from the Penn Treebank dataset, which is not publicly available.
If you have the PTB corpus, you can run:
./scripts/fetch_and_make_conll05_data.sh /path/to/ptb/
You have to follow the instructions below to get CoNLL-2012 data
CoNLL-2012, this would result in a directory called /path/to/conll-formatted-ontonotes-5.0
.
Run:
./scripts/make_conll2012_data.sh /path/to/conll-formatted-ontonotes-5.0
experiments.conf
conll2012_best
python singleton.py <experiment>
python evaluator.py <experiment>
logs
directory and can be viewed via TensorBoard.python test_single.py <experiment>
for the single-model evaluation. For example: python test_single.py conll2012_final
GPU
environment variable, which the code treats as shorthand for CUDA_VISIBLE_DEVICES
.