Code for the paper "Lifelong Language Knowledge Distillation"
In The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
by Yung-Sung Chuang, Shang-Yu Su, Yun-Nung Chen
Our code is based on the released code from LAnguage-MOdeling-for-Lifelong-Language-Learning. Most of the settings are identical to theirs.
Task | Dataset (Original Data Link) |
---|---|
Summarization | CNN/DM |
Goal-Oriented Dialogue | WOZ |
Semantic Parsing | WikiSQL |
Natural Language Generation | E2ENLG |
Natural Language Generation | RNNLG |
Text Classification | AGNews, Yelp, Amazon, DBPedia, Yahoo |
We use the released data from LAMOL's authors here, except for E2ENLG and RNNLG datasets.
We also release our processed data in here.
requirements.txt
data directory
: Where the dataset will be load by the model.model directory
: The place for the model to dump its outputs.data directory
.env.example
and save it as env
. In env
, set the value of DATA_DIR as data directory
and set the value of MODEL_ROOT_DIR as model directory
.train.sh
and test.sh
are the entrance for training and testing. Main options for them include:
Options | Description |
---|---|
seq_train_type | The mode to deal with a sequence of tasks. Mode include: lll|finetune|multitask|mas|ewc|gem. "lll" is the default value corresponding our proposed method. The others are the methods for comparing with our proposal. |
tasks | A sequence of tasks we want to train by seq_train_type. Leave a space between tasks after the --tasks tag. Tasks are the keys in TASK_DICT variable in settings.py |
model_name | The language model we want to use. The default is gpt2 . Options include gpt2|openai-gpt, |
gen_lm_sample_percentage | This tag only works with --seq_train_type lll . The percentage of the size of the dataset will be generated as pseudo samples for our proposed method. |
lm_lambda | Lambda value for the loss function. |
max_n_epochs | Maximum epoch value for all tasks. |
min_batch_size | Minimum batch size for all tasks. |
min_n_steps | Minimum step for optimizing the model for all tasks. |
n_train_epochs | Epochs for training for all tasks. |
n_gpu | Number of gpu to be used. |
reg_lambda | Lambda value for mas and ewc. |
top_k_lm | Top k sampling for the language model. |
top_k_qa | Top k sampling for the qa model. |
train_batch_size | Batch size for all tasks. The default is 0. Once the value equals to 0, The batch size will be decided dynamically based on the memory usage of the gpu. |
Options | Description |
---|---|
distil | Use --distil to conduct Word-KD (the teacher model under models/gpt2/lll/[TASK]_0.2/ is needed if [TASK] is in your LLL tasks.) |
seq_distil | Use --seq_distil to conduct Seq-KD (distilled data need to be put in data/[TASK]_to_squad-distil-v2.0.json , which can be found in Supplementary Materials.) |
See examples in run_seqsoftkd-WCS.sh/run_seqsoftkd-NLG.sh/run_seqsoftkd-TC.sh
, which conduct Seq-KD(soft) on all the experiments in our paper.
In the examples, both --seq_distil
and --distil
are add to the arguments.
If you want to conduct Word-KD, skip --seq_distil
in the arguments.
If you want to conduct Seq-KD, skip --distil
in the arguments.
We add $SEED suffix to the model dir
If assigning multitask to --seq_train_type
tag, the model will be dumped in $MODEL_ROOT_DIR / model_name / seq_train_type /TASK1_TASK2_...
directory. Otherwise, it will be in $MODEL_ROOT_DIR / model_name / seq_train_type / TASK1_TASK2_... / TASK1
, $MODEL_ROOT_DIR / model_name / seq_train_type / TASK1_TASK2_... / TASK2
, ... directories.
fp16.py
, fp16util.py
) is from Megatron-LM, the ongoing research training transformer language models at scale by NVIDIA.@article{chuang2020lifelong,
title={Lifelong Language Knowledge Distillation},
author={Chuang, Yung-Sung and Su, Shang-Yu and Chen, Yun-Nung},
journal={arXiv preprint arXiv:2010.02123},
year={2020}
}
@inproceedings{sun2019lamol,
title={LAMOL: LAnguage MOdeling for Lifelong Language Learning},
author={Sun, Fan-Keng and Ho, Cheng-Hao and Lee, Hung-Yi},
booktitle={International Conference on Learning Representations},
year={2019}
}