princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
188 stars 32 forks source link

What's the model I should prepare and the training process? #42

Closed gaishun closed 1 year ago

gaishun commented 1 year ago

Hi thanks for the great work @xiamengzhou , but sorry for that I'm not clear about the training process and the models I should prepare before. Here's my comprehension:

If I need to prune a BERT on MNLI and then test, there are three stages: train (prune), fine-tune (prune_type=none), evaluate.

Firstly, I need to download an original BERT_base_uncased, and applied the prune_type=none fine-tuning on the original BERT? But in the fine-tune bash, there is a distillation_path. bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION $SPARSITY_EPSILON [PRUNED_MODEL_PATH] $LEARNING_RATE Q1: If I use prune_type=none fine-tuning in readme, how can I put it in the distillation_path preliminarily...

Then I get a MNLI-fine-tuned BERT, I will prune it and also regard it as the teacher model. Q2: Whether the MNLI-fine-tuned BERT is the model to be pruned, or in the prune process there would import a bert_base as the model to be pruned?

In the train and fine-tune stage, the arguments 'distillation_path' is the path of MNLI-fine-tuned BERT, and 'pretrained_pruned_model' is the path of the pruned-MNLI-BERT.

I don't really understand the original model I should use. How can I get your fine-tuned BERT, or use your way to fine-tune the original BERT?

gaishun commented 1 year ago

Have got solutions through email.

pvcastro commented 1 year ago

Hey @gaishun did you manage to get a good pruned transformer fine-tuned model? Do you mind sharing the steps?

gaishun commented 1 year ago

Sorry, I'm just trying to pre-finetune the BERT on datasets, and some errors are still not resolved.