marseille

mining argument structures with expressive inference (with linear and lstm engines)

What is it?

Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.

Requirements

numpy
scipy
scikit-learn
pystruct
nltk
dill
docopt
dynet v1.1
lightning
ad3 >= v2.1 (pip install ad3)

Usage

(replace $ds with cdcp or ukp)

download the data from http://joonsuk.org/ and unzip it in the subdirectory data, i.e. the path ./data/process/erule/train/ is valid.

extract relevant subset of GloVe embeddings:

python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt

extract features:

python -m marseille.features $ds

# (for cdcp only:)
python -m marseille.features cdcp-test

generate vectorized train-test split (for baselines only)

mkdir data/process/.../
python -m marseille.vectorize split cdcp

run chosen model, for example:
```
python -m experiments.exp_train_test $ds --method rnn-struct --model strict
```
(for dynet models, set --dynet-seed=42 for exact reproducibility)

compare results:

python -m experiments.plot_test_results.py $ds

To reproduce cross-validation model selection, you also would need to run:

    python -m marseille.vectorize folds $ds

Running a model on your own data:

If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.

download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:

    export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp  #  path to CoreNLP
    export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus  #  path to WING-NUS parser

Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.

Otherwise, these files are generated the first time that a UserDoc object is instantiated for a given document. In particular, the step below will do this automatically.

extract the features:

    python -m marseille.features user F G  # raw input must be in F.txt & G.txt

This is needed for the RNN models too, because the feature files encode some metadata about the document structure.

predict, e.g. using the model saved in step 4 above:

    python -m experiments.predict_pretrained --method=rnn-struct \
    test_results/exact=True_cdcp_rnn-struct_strict F G

vene / marseille

readme

marseille

What is it?

Requirements

Usage

Running a model on your own data: