jiangycTarheel-zz / EPAr

MIT License
24 stars 2 forks source link


Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

1. Dependencies

2. Data:

2.1 Qangaroo:

Download data from http://qangaroo.cs.ucl.ac.uk/index.html and put the uncompressed folder in ~/data directory.

2.2 GloVe Embeddings:

For our main model download the glove.840B.300d.zip word vectors from https://nlp.stanford.edu/projects/glove/ and place it ~/data/glove/. For our smaller model (which we use throughout analysis) download glove.6B.zip from the same link.

3. WikiHop:

Here we show how to run our full-size EPAr model with 300-d GloVe embeddings and 100-d LSTM hidden size.

3.1 Data preprocessing:

Run the following script:


In order to run a small model with 100-d word embeddings and 20-d hidden size, delete these 2 options from the preprocessing scripts: --glove_corpus="840B", --glove_vec_size=300, and delete these 3 options from the training/testing script: --emb_dim=300, --hidden_size=100, cudnn_rnn=True. In addition to that we train our smaller model in 2 stages, first without using the Assembler (refer to oracle-epar-train.sh) and then all the 3 modules jointly using main script (full-epar-train.sh).

3.2 Train:

To train the full 3-module system:


for around 40k iterations. Change the --run_id in our scripts to train different models.

Note: The WikiHop scripts above are designed for multi-gpu setting . Change the num_gpus (and then the batch_size) accordingly. In the provided training scripts, we use 2 gpus and batch size of 5. For training a small model, we recommend 1 gpu and batch size of 10.

3.3 Test:



The model checkpoints are saved in out/basic/qangaroo/[RUN_ID]/save/.

4 MedHop (non-masked):

4.1 Data preprocessing:

Run the following script:


4.2 Train:

To train full model run this script (should converge in close to 3k iterations):


4.3 Test:

To evaluate the trained model on dev set, run the following script:


5 Pretrained Models

We release our pretrained models for WikiHop and MedHop here. On running the testing script on these models, you should get 67.2% accuracy on the WikiHop dev set, and 64.9% accuracy on the MedHop dev set.


  author={Yichen Jiang, Nitish Joshi, Yen-chun Chen and Mohit Bansal}, 
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics}, 
  title={Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension}, 