Code and data of EMNLP'2023 Findings: Improving Question Generation with Multi-level Content Planning.
Previous studies have suggested that key phrase selection is essential for question generation (QG), yet it is still challenging to connect such disjointed phrases into meaningful questions, particularly for long context. To mitigate this issue, we propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-Model, which simultaneously selects key phrases and generates full answers, and Q-Model which takes the generated full answer as an additional input to generate questions. Here, full answer generation is introduced to connect the short answer with the selected key phrases, thus forming an answer-aware summary to facilitate QG. Both FA-Model and Q-Model are formalized as simple-yet-effective Phrase-Enhanced Transformers, our joint model for phrase selection and text generation.
Install necessary packages:
pip install -r requirements.txt
Before you run our code, please read these implementation and evaluation details.
We implement our modeling ${ProjHome}/src/MultiFactor/modeling_bridget5.py
by inheriting from modeling_t5.py
in Transformers 4.20.1.
${ProjHome}/src/MultiFactor/modeling_bridget5.py
. We also provide a demo in modeling_bridget5.py
.We recommend the transformers version is lower than 4.30 although the T5-family tokenizers' issue.
We evulated the BLEU socre using NLTK in ${ProjHome}/src/MultiFactor/multi_factor_trainer.py
instead of sacreBLEU for two reasons:
METEOR score will fluctuate wildly if using different packages. Please read nltk/issues for details. Here, we use the meteor
api in pycocoevalcap.
Different version of BERTScore will also influence the final score hugly, our version is 0.3.10.
We provide a evaluate script demo in ${ProjHome}/evaluate.py
Because it is a cumbersome step, we provide a demo.json
based on CQG dev.json [1] directly, which has included the pseudo-gold full answer constucted mentioned in our paper.
Our implementation reads json files instead of jsonl. And the schema is:
{
"context": "the given input context",
"answer": "the given answer",
"question": "the corresponding question",
"p_phrase": "the postive phrases in the given context",
"n_phrase": "the negative phrases",
"full answer": "pseudo-gold full answer (q + a -> a declarative sentence) ",
}
We provide four datasets huggingface dataset link mentioned in our paper:
dataset name | dataset link |
---|---|
HotpotQA-Supporting Facts | multifactor_hotpotqa_suppfacts |
HotpotQA-Full Document | coming soon |
SQuAD 1.1 - Zhou split | multifactor_squad1.1_zhou |
SQuAD 1.1 - Du split | coming soon |
When using your custom dataset, please make sure that p/n_phrase
are in your context
.
We read raw dataset json file ${ProjHome}\dataset\${dataset_name}\${split}.json
, and generate corresponding .pt
in ${ProjHome}\dataset\${dataset_name}\${data_foramt}\${split}.pt
. If the dataset input contains the full answers from outside (like inference, not the GOLD-FULL-ANSWER!!), the full answer json file path is ${ProjHome}\dataset\${dataset_name}\${data_foramt}\${split}.json
.
python ${ProjHome}/src/MultiFactor/multi_factor_dataset.py \
-d ${dataset_name} \
-j ${data_format} \
-l ${max_length}
${ProjHome}/src/config.ini
. You can edit:
{ProjHome}/src/MultiFactor/arguments.py
and {ProjHome}/src/run.py
.CUDA_VISIBLE_DEVICES=0
python \
${ProjHome}/src/run.py \
-c ${ProjHome}/src/config.ini \
-j ${dataset_name} \
-f ${data_foramt} \
--seed ${seed} \
--model_type ${model_type} \
--cls_loss_weight 1.0 \
--learning_rate 1e-4 \
--num_train_epochs 10 \
--num_beams 1 \
--hard_flag 2 \
--save_model_pt True
Source code: ${ProjHome}/src/MultiFactor/multi_factor_config.py
${ProjHome}/src/MultiFactor/multi_factor_trainer.py
).Source code: ${ProjHome}/src/MultiFactor/multi_factor_dataset.py
Name | Input | Infer fa |
---|---|---|
multi_factor | f"{ANS_SEP} {answer} {FULL_ANS_SEP} {fa} {CONTEXT_SEP} {context}" | Yes |
top1_q_model | f"{ANS_SEP} {answer} {FULL_ANS_SEP} {fa} {CONTEXT_SEP} {context}" | Yes |
multi_factor_mixqg | f"{answer} //n {FULL_ANS_SEP} {fa} {CONTEXT_SEP} {context}" | Yes |
top1_q_model_mixqg | f"{answer} //n {FULL_ANS_SEP} {fa} {CONTEXT_SEP} {context}" | Yes |
mix_full_answer | f"{ANS_SEP} {answer} {FULL_ANS_SEP} {fa_string} {CONTEXT_SEP} {context}" | Yes |
*fa_model | f"{ANS_SEP} {answer} {CONTEXT_SEP} {context}" | No |
q_model_upper | f"{ANS_SEP} {answer} {FULL_ANS_SEP} {fa} {CONTEXT_SEP} {context}" | No |
pet | f"{ANS_SEP} {answer} {CONTEXT_SEP} {context}" | No |
pet_mixqg | f"{answer} //n {CONTEXT_SEP} {context}" | No |
full2question_converter | f"{ANS_SEP} {_answer} {FULL_ANS_SEP} {fa}" | No |
Where:
${ProjHome}\dataset\${dataset_name}\${data_foramt}\${split}.json
).Here we show the demo to conduct experiments on HotpotQA-Supporting Facts.
Put the raw dataset (contains the pseudo-gold full answer) in ${ProjHome}\dataset\cqg
Train the FA_Model:
python ${ProjHome}/src/MultiFactor/multi_factor_dataset.py \
-d cqg \
-j fa_model \
-l 256
Conduct FA_Model inference on train, dev and test splits. And put them as ${ProjHome}\dataset\cqg\multi_factor\${split}.json
Prepare MultiFactor and ablation study dataset.pt
.
python ${ProjHome}/src/MultiFactor/multi_factor_dataset.py \
-d cqg \
-j multi_factor \
-l 256
python ${ProjHome}/src/MultiFactor/multi_factor_dataset.py \
-d cqg \
-j pet \
-l 256
Train MultiFactor.
CUDA_VISIBLE_DEVICES=0
dataset_name=cqg
seed=42
model_type=multifactor
data_foramt=multi_factor
python \
${ProjHome}/src/run.py \
-c ${ProjHome}/src/config.ini \
-j ${dataset_name} \
-f ${data_foramt} \
--seed ${seed} \
--model_type ${model_type} \
--cls_loss_weight 1.0 \
--learning_rate 1e-4 \
--num_train_epochs 10 \
--num_beams 1 \
--hard_flag 2 \
--save_model_pt True
Run other experiments, we list these key arguments as follows:
# baseline
data_foramt=pet
model_type=baseline
data_foramt=pet model_type=node_cls
data_foramt=pet model_type=multifactor
data_foramt=pet model_type=multifactor hard_flag=3
--train_model_name_or_path Salesforce/mixqg-base
## š¤ Cite:
Please consider citing this paper if you use the code from our work.
Thanks a lot :)
```bigquery
@article{DBLP:journals/corr/abs-2310-13512,
author = {Zehua Xia and
Qi Gou and
Bowen Yu and
Haiyang Yu and
Fei Huang and
Yongbin Li and
Cam{-}Tu Nguyen},
title = {Improving Question Generation with Multi-level Content Planning},
journal = {CoRR},
volume = {abs/2310.13512},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2310.13512},
doi = {10.48550/ARXIV.2310.13512}
}
[1] Yang, Zhilin, et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. EMNLP, 2018.
[2] Fei, Zichu, et al. CQG: A Simple and Effective Controlled Generation Framework for Multi-Hop Question Generation. ACL, 2022.
[3] Su, Dan, et al. QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation. ICASSP, 2023.
[4] Rajpurkar, Pranav, et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP, 2016.
[5] Zhou, Qingyu, et al. Neural Question Generation from Text: A Preliminary Study. EMNLP, 2017.