psunlpgroup / Summ-N

Code for ACL 2022 Paper "SUMM^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents"
MIT License
58 stars 7 forks source link

SummN

Source code for ACL 2022 paper SUMM^N: A Multi-Stage Summarization Framework for Long InputDialogues and Documents

Update

Folder Structure

Training and Evaluation

Download the Datasets and Models

wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json' wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe' wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'

- Setup the ROUGE155 following https://github.com/chatc/AnyROUGE

### Training the Model
- After we setup the datasets, setup the paths of scripts at `scripts/{dataset name}.sh`
- Train the model by the command: `bash scripts/{dataset name}.sh`

### Evaluation
- First download the checkpoint from [Google Drive](https://drive.google.com/drive/folders/1_2ULrbeQcYN3It99QnqUcWAlU-zu-ceP?usp=sharing)
- Then, setup the paths of scripts at `scripts/{dataset name}.sh`
- Finally, specify the mode and checkpoint_dir in the running scripts. For instance,
```shell
python run.py --cfg ICSI.cfg \
 --dataset-path /data/yfz5488/fair/ICSI/ICSI_proprec \
 --output-path ./output/${RUN_NAME} \
 --save-intermediate \
 --cuda-devices 3 \
 --model-path $BART_PATH \
 --mode test \
 --checkpoint-dir path/to/checkpoints

And run this script to do the evaluation on test set only.

Add a New Task

It is easy to add new task/dataset into Summ-N.

Citation

@inproceedings{zhang2021summn,
  title={Summ\^{} N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents},
  author={Zhang, Yusen and Ni, Ansong and Mao, Ziming and Wu, Chen Henry and Zhu, Chenguang and Deb, Budhaditya and Awadallah, Ahmed H and Radev, Dragomir and Zhang, Rui},
  booktitle={ACL 2022},
  year={2022}
}