princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
MIT License
192 stars 31 forks source link

Troubles reproducing the results #3

Closed kongds closed 2 years ago

kongds commented 2 years ago

Hello, Thank you for providing code. But I have a question on how to reproduce the results the 95% sparsity on MNLI with the following commands:

TASK=MNLI
SUFFIX=sparsity0.95
EX_CATE=CoFi
PRUNING_TYPE=structured_heads+structured_mlp+hidden+layer
SPARSITY=0.95
DISTILL_LAYER_LOSS_ALPHA=0.9
DISTILL_CE_LOSS_ALPHA=0.1
LAYER_DISTILL_VERSION=4
DISTILLATION_PATH=dynabert/MNLI
CUDA_VISIBLE_DEVICES=1 bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY $DISTILLATION_PATH $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION

And I get following results with accuracy 78.20 on MNLI:

wandb: Run history:
wandb:                   eval/loss ▃▁▂▂▂▃██▆▆▅▅▅▅▆▅▅▄▄▅▄▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▅▄▄▄
wandb:              train/accuracy ▆█▇██▇▁▁▃▄▄▄▄▅▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆
wandb:     train/expected_sparsity ▁▃▄▆████████████████████████████████████
wandb:           train/global_step ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:           train/hidden_dims █████▁▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
wandb:              train/lag_loss ▆▆▇▆▆█▁▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆
wandb:         train/learning_rate █████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
wandb:                  train/loss ▂▁▆▂▂▇▃█▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▆▅▆▆▆
wandb: train/pruned_model_sparsity ▁▃▄▆████████████████████████████████████
wandb:         train/pruned_params ▁▃▄▆████████████████████████████████████
wandb:              train/reg_loss ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:      train/remaining_params █▆▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:       train/target_sparsity ▁▂▄▆████████████████████████████████████
wandb:
wandb: Run summary:
wandb:                   eval/loss 0.66644
wandb:              train/accuracy 0.78197
wandb:     train/expected_sparsity 0.94999
wandb:           train/global_step 0
wandb:           train/hidden_dims 764
wandb:              train/lag_loss 1e-05
wandb:         train/learning_rate 0.0
wandb:                  train/loss 0.40625
wandb: train/pruned_model_sparsity 0.95561
wandb:         train/pruned_params 81243440
wandb:              train/reg_loss 0.0
wandb:      train/remaining_params 3774160
wandb:       train/target_sparsity 0.95

By the way, I found some issues during reproducing:

  1. In evaluation.py:77, datasets["validation"] should be datasets["validation_matched"] for MNLI.
  2. Label_map in MNLI, dynabert and princeton-nlp/CoFi-MNLI-s95 use different label map compare to MNLI in datasets.load_dataset. And directly evaluating with python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 will get wrong result.
  3. Pruning is unavailable for trained models in evaulation.py. For example, model is not purnned according to zs.pt with python evaluation.py MNLI ./out/MNLI/CoFi/MNLI_sparsity0.95.
xiamengzhou commented 2 years ago

Hi,

Thanks for using our repo! It looks like you have only performed the first step for pruning. Keeping fine-tuning the pruned model as follows should bump up the performance by 2 points (You can also find the instructions in README.md). Also, we keep finetuning the model on the checkpoint with the best performance during pruning. We found that finetuning the pruned model is essential for final performance. For small-sized models, we also suggest that you should a do hyperparameter search for DISTILL_LAYER_LOSS_ALPHA.

PRUNED_MODEL_PATH=$proj_dir/$TASK/$EXCATE/${TASK}${SUFFIX}/best PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning. LEARNING_RATE=3e-5

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION [PRUNED_MODEL_PATH] $LEARNING_RATE

For the other issues

  1. Thanks for spotting this issue, and we just fixed it :)
  2. We evaluated dynabert on MNLI loaded with datasets.load_dataset. In terms of dataset usage, it should be a fair comparison. Could you clarify what you mean by evaluating with python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 will get wrong result.?
  3. evaluation.py is not a generic evaluation script for evaluating all different kinds of pruned models but only for models pruned with CoFi. If you had encountered issues to evaluate CoFi models, could you share the error message?

Hope this helps and let me know if you have more questions!

kongds commented 2 years ago

Thanks for your prompt reply, I will try to fine-tune the pruned model in MNLI.

For issue 2, the MNLI in huggingface datasets using follow label maplabel2id = {'entailment': 0, 'neutral': 1, 'contradiction': 2}. However, in dynabert and princeton-nlp/CoFi-MNLI-s95, it seems use the following label map {'entailment': 1, 'neutral': 2, 'contradiction': 0}. So for training and evaluating, it should add something to match the labels in preprocess_function for MNLI like:

    if data_args.task_name == 'mnli':
        label_to_id = {1:2, 0:1, 2:0}

I get the results python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95

For issue 3, the pruned models I used is CoFi models trained from above commands. However, no module is pruned:

image

It seems the zs in evaluation.py:276 is not used for pruning model, while the pruned information seems not store in config.json.

image
xiamengzhou commented 2 years ago

Hi,

Thanks for providing more details!

For issue 2: thanks for spotting this issue!!! This issue stemmed from the label mismatch between GLUEDataset and datasets. Our models were pruned from a previous version of implementation where we use GLUEDataset to load the dataset and in the current version we switched to datasets. I modified the code, so it should be working now!

For issue 3: our evaluation.py only supports loading pruned models (models of a smaller size) but not models with compression vectors (model with original size + zs.pt). You would be able to get the pruned models after fine-tuning. I added the logic in the script to also support loading models with compression vectors now. It's not the best practice, but it should suffice as a workaround for now.

Let me know if you still encounter any issues and I am happy to continue to help :)

kongds commented 2 years ago

Thank you