iesl / multicls

The code for "Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling"
Other
8 stars 1 forks source link

Code for "Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling"

Setup

Only tested on python3.6.

python -m pip install virtualenv
virtualenv bert_env
source bert_env/bin/activate
pip install -r requirements.txt

Usage

The code is built on the source code of On Losses for Modern Language Models with several enhancements and modifications. In addition to previous proposed pre-training tasks ("mlm", "rg (QT) in the paper", "tf", "tf_idf", "so"...etc), we provide a new training mechanism for transformers which enjoys the benefits of ensembling without sacrificing efficiency. To train our Multi-CLS Bert, simply specify --model-type mf (MCQT in paper) with number of facets K you want via --num-facets K.

Currently mf type can be combined with any of the following methods:

When pre-training with multi-tasks, the loss function can calculated using any of the following methods:

To view all usable parameters that shares by all different pretrain tasks, you may find them in arguments.py.

Note that our code still supports those comparing tasks listed in our paper, you may just change the model type to reproduce the result (ex: using --model-type rg+so+tf_idf to perform MTL method )

Before training, you should

The following command is the best setting that we used our paper for Multi-Bert

python -m pretrain_bert --model-type mf,tf_idf,so --pretrained-bert --save-iters 200000 --lr 2e-5 --agg-function max --warmup 0.001 --facet2facet  --epochs 2 --num-facets 5 --diversify_hidden_layer 4,8 --loss_mode log  --use_hard_neg 1 --batch-size 30 --seed 1 --diversify_mode lin_new --add_testing_agg --agg_weight 0.1 --save_suffix _add_testing_agg_max01_n5_no_pooling_no_bias_h48_lin_no_bias_hard_neg_tf_idf_so_bsz_30_e2_norm_facet_warmup0001_s1

Fine-tuning

Before running fine-tuning task, change output_path in evaluate/generate_random_number.py as well as random_file_path in evaluate/config/test_bert.conf to your local path. Run the python file to generate random number, which is to ensure the random seeds for training data sampling remain same while fine-tuning.

To run fine-tuning task: You will need to convert the saved state dict of the required model using the convert_state_dict.py file. Then run: python3 -m evaluate.main --exp_name [experiment name] --overrides parameters_to_Overide Where experiment name is the same as the model type above. If using a saved checkpoint instead of the best model, use the --checkpoint argument. You may change the data you want to use in paths.py, can be glue or super glue. As for the --overrides, this parameter accepts command like strings to override the default values in fine-tuning config (evaluate/config/test_bert.conf). You may specify learning rate, model_suffix or few shot setting there.

In Multi-Bert, we provide different ways to aggregate all the CLS embeddings. To specify the aggregation function, change the value of pool_type in evaluate/config/test_bert.conf

The following command is an example to run fine-tuning task on Glue dataset with few shot sample size =100. Use run name with suffix to reload the model weight you saved from pretraining.

common_para="warmup_ratio = 0.1, max_grad_norm = 1.0, pool_type=proj_avg_train, "
common_name="warmup01_clip1_proj_avg_train_correct"

python -m evaluate.main 
--exp_name $exp_name 
--overrides "run_name = ${model_name}_1, 
$common_para pretrain_tasks = glue}, 
target_tasks = glue, 
lr=1e-5, batch_size=4, few_shot = 32, max_epochs = 20, 
pooler_dropout = 0, random_seed = 1, 
run_name_suffix = adam_${common_name}_e20_bsz4:s1:lr"

Citation

@inproceedings{chang2023multi-cls,
  title={Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling},
  author={Haw-Shiuan Chang* and Ruei-Yao Sun* and Kathryn Ricci* and Andrew McCallum},
  booktitle={Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2023},
}