vene / marseille

Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)
BSD 3-Clause "New" or "Revised" License
63 stars 29 forks source link
argumentation deep-learning discourse-analysis machine-learning natural-language-processing nlp nlp-machine-learning structured-learning

marseille

mining argument structures with expressive inference (with linear and lstm engines)

What is it?

Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.

Read more about it in our paper,

Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.

If you find this project useful, you may cite us using:

@inproceedings{niculae17marseille,
  author={Vlad Niculae and Joonsuk Park and Claire Cardie},
  title={{Argument Mining with Structured SVMs and RNNs}},
  booktitle={Proceedings of ACL},
  year=2017
}

Requirements

Usage

(replace $ds with cdcp or ukp)

  1. download the data from http://joonsuk.org/ and unzip it in the subdirectory data, i.e. the path ./data/process/erule/train/ is valid.

  2. extract relevant subset of GloVe embeddings:

    python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt
  3. extract features:

    python -m marseille.features $ds
    
    # (for cdcp only:)
    python -m marseille.features cdcp-test
  4. generate vectorized train-test split (for baselines only)

    mkdir data/process/.../
    python -m marseille.vectorize split cdcp
  5. run chosen model, for example:

    python -m experiments.exp_train_test $ds --method rnn-struct --model strict

    (for dynet models, set --dynet-seed=42 for exact reproducibility)

  6. compare results:

    python -m experiments.plot_test_results.py $ds

To reproduce cross-validation model selection, you also would need to run:

    python -m marseille.vectorize folds $ds

Running a model on your own data:

If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.

  1. download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:
    export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp  #  path to CoreNLP
    export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus  #  path to WING-NUS parser

Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.

Otherwise, these files are generated the first time that a UserDoc object is instantiated for a given document. In particular, the step below will do this automatically.

  1. extract the features:
    python -m marseille.features user F G  # raw input must be in F.txt & G.txt

This is needed for the RNN models too, because the feature files encode some metadata about the document structure.

  1. predict, e.g. using the model saved in step 4 above:
    python -m experiments.predict_pretrained --method=rnn-struct \
    test_results/exact=True_cdcp_rnn-struct_strict F G