luheng / lsgn

Labeled Span Graph Networks
Apache License 2.0
118 stars 27 forks source link

Labeled Span Graph Network (Under Construction)

This repository contains code and models for replicating results from the following publication:

Part of the codebase is extended from e2e-coref.

Requirements

Getting Started

Setting up for ELMo (in progress)

Making Predictions with Pretrained Models

[["John", "told", "Pat", "to", "stop", "the", "robot", "immediately", "."], ["Pat", "refused", "."]]

The following json object

"predicted_srl": [[1, 0, 0, "ARG0"], [1, 2, 2, "ARG2"], [1, 3, 7, "ARG1"], [4, 2, 2, "ARG0"], [4, 5, 6, "ARG1"], [4, 7, 7, "ARGM-TMP"], [10, 9, 9, "ARG0"]]

contains SRL predictions for the two sentences, formatted as [predicate_position, argument_span_start, argument_end, role_label]. The token ids are counted starting 0 from the beginning of the document (instead of the beginning of each sentence).

CoNLL Data

For replicating results on CoNLL-2005 and CoNLL-2012 datasets, please follow the steps below.

CoNLL-2005

The data is provided by: CoNLL-2005 Shared Task, but the original words are from the Penn Treebank dataset, which is not publicly available. If you have the PTB corpus, you can run:
./scripts/fetch_and_make_conll05_data.sh /path/to/ptb/

CoNLL-2012

You have to follow the instructions below to get CoNLL-2012 data CoNLL-2012, this would result in a directory called /path/to/conll-formatted-ontonotes-5.0. Run:
./scripts/make_conll2012_data.sh /path/to/conll-formatted-ontonotes-5.0

Training Instructions

Other Quirks