sunnweiwei / AmbigPrompt

Answering Ambiguous Questions via Iterative Prompting
13 stars 2 forks source link

AmbigPrompt

Code for paper Answering Ambiguous Questions via Iterative Prompting.

AmbigPrompt

Prepare Data

Download the Wikipedia text splitted by 100 words from DPR, put it under data/wikipedia/psgs_w100.tsv, and run the following command to build Wikipedia's redis cache.

python dataset.py

Download NQ and AmbigNQ data from shmsw25/AmbigQA and put it under data/nq and data/ambig

Train Retrieval Model

Training a dense passage retrieval model using luyug/dense

python train_dense.py

Encode the passages and perform passage retrieval using Faiss.

python inference_dense.py

This step obtains QA data that includes the 100 retrieved passages, like data/ambig/dev.json.

Train QA Model

Download the pre-trained checkpoint from facebookresearch/FiD.

Train the prompting model and QA model on multi-answer QA data:

accelerate launch train.py --data_path data/ambig/train.json --save_path out/ambig/model --do_train true --do_eval false

Evaluate the model:

accelerate launch train.py --data_path data/ambig/dev.json --checkpoint out/ambig/model/9.pt --do_train false --do_eval true

Pseudo MultiQA Data Construction

Train a span selection baseline using script in shmsw25/AmbigQA. Predict answers on each of the 100 retrieved passages. Detailed scripts and produced datasets coming soon.

Cite

@inproceedings{Sun2023IsCG,
  title={Answering Ambiguous Questions via Iterative Prompting},
  author={Weiwei Sun and Hengyi Cai and Hongshen Chen and Pengjie Ren and Zhumin Chen and Maarten de Rijke and Zhaochun Ren},
  booktitle={ACL},
  year={2023},
}