Source code for ACL 2022 paper SUMM^N: A Multi-Stage Summarization Framework for Long InputDialogues and Documents
Release some of the prediction files (*.hypo one sample each line) together with the checkpoints. Google Drive Link
Install Fairseq according to their official instructions https://github.com/pytorch/fairseq
pip install -r requirements.txt
to install the rest of the packages
We use python==3.7, pytorch==1.8.1 (cuda=11.1), and fairseq==0.10.0
# bart cnn
wget https://dl.fbaipublicfiles.com/fairseq/models/bart.large.cnn.tar.gz
tar -xzvf bart.large.cnn.tar.gz
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json' wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe' wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'
- Setup the ROUGE155 following https://github.com/chatc/AnyROUGE
### Training the Model
- After we setup the datasets, setup the paths of scripts at `scripts/{dataset name}.sh`
- Train the model by the command: `bash scripts/{dataset name}.sh`
### Evaluation
- First download the checkpoint from [Google Drive](https://drive.google.com/drive/folders/1_2ULrbeQcYN3It99QnqUcWAlU-zu-ceP?usp=sharing)
- Then, setup the paths of scripts at `scripts/{dataset name}.sh`
- Finally, specify the mode and checkpoint_dir in the running scripts. For instance,
```shell
python run.py --cfg ICSI.cfg \
--dataset-path /data/yfz5488/fair/ICSI/ICSI_proprec \
--output-path ./output/${RUN_NAME} \
--save-intermediate \
--cuda-devices 3 \
--model-path $BART_PATH \
--mode test \
--checkpoint-dir path/to/checkpoints
And run this script to do the evaluation on test set only.
It is easy to add new task/dataset into Summ-N.
configure
directory, one can write the cfg file following other files, e.g. configure/ICSI.cfg
is a 3 stage configdataset_loader
directory. dataset_loader/ICSI.py
can be a good examplescripts
, following e.g. scripts/run_ICSI.sh
bash scripts/{Your Dataset}.sh
@inproceedings{zhang2021summn,
title={Summ\^{} N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents},
author={Zhang, Yusen and Ni, Ansong and Mao, Ziming and Wu, Chen Henry and Zhu, Chenguang and Deb, Budhaditya and Awadallah, Ahmed H and Radev, Dragomir and Zhang, Rui},
booktitle={ACL 2022},
year={2022}
}