Closed kongds closed 2 years ago
Hi,
Thanks for using our repo! It looks like you have only performed the first step for pruning. Keeping fine-tuning the pruned model as follows should bump up the performance by 2 points (You can also find the instructions in README.md
). Also, we keep finetuning the model on the checkpoint with the best performance during pruning. We found that finetuning the pruned model is essential for final performance. For small-sized models, we also suggest that you should a do hyperparameter search for DISTILL_LAYER_LOSS_ALPHA
.
PRUNED_MODEL_PATH=$proj_dir/$TASK/$EXCATE/${TASK}${SUFFIX}/best PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning. LEARNING_RATE=3e-5
bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION [PRUNED_MODEL_PATH] $LEARNING_RATE
For the other issues
datasets.load_dataset
. In terms of dataset usage, it should be a fair comparison. Could you clarify what you mean by evaluating with python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 will get wrong result.
?evaluation.py
is not a generic evaluation script for evaluating all different kinds of pruned models but only for models pruned with CoFi. If you had encountered issues to evaluate CoFi models, could you share the error message? Hope this helps and let me know if you have more questions!
Thanks for your prompt reply, I will try to fine-tune the pruned model in MNLI.
For issue 2, the MNLI in huggingface datasets using follow label maplabel2id = {'entailment': 0, 'neutral': 1, 'contradiction': 2}
. However, in dynabert
and princeton-nlp/CoFi-MNLI-s95
, it seems use the following label map {'entailment': 1, 'neutral': 2, 'contradiction': 0}
.
So for training and evaluating, it should add something to match the labels in preprocess_function
for MNLI like:
if data_args.task_name == 'mnli':
label_to_id = {1:2, 0:1, 2:0}
I get the results python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95
For issue 3, the pruned models I used is CoFi models trained from above commands. However, no module is pruned:
It seems the zs
in evaluation.py:276
is not used for pruning model, while the pruned information seems not store in config.json
.
Hi,
Thanks for providing more details!
For issue 2: thanks for spotting this issue!!! This issue stemmed from the label mismatch between GLUEDataset and datasets. Our models were pruned from a previous version of implementation where we use GLUEDataset to load the dataset and in the current version we switched to datasets. I modified the code, so it should be working now!
For issue 3: our evaluation.py
only supports loading pruned models (models of a smaller size) but not models with compression vectors (model with original size + zs.pt). You would be able to get the pruned models after fine-tuning. I added the logic in the script to also support loading models with compression vectors now. It's not the best practice, but it should suffice as a workaround for now.
Let me know if you still encounter any issues and I am happy to continue to help :)
Thank you
Hello, Thank you for providing code. But I have a question on how to reproduce the results the 95% sparsity on MNLI with the following commands:
And I get following results with accuracy 78.20 on MNLI:
By the way, I found some issues during reproducing:
evaluation.py:77
,datasets["validation"]
should bedatasets["validation_matched"]
for MNLI.dynabert
andprinceton-nlp/CoFi-MNLI-s95
use different label map compare to MNLI indatasets.load_dataset
. And directly evaluating withpython evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95
will get wrong result.evaulation.py
. For example, model is not purnned according to zs.pt withpython evaluation.py MNLI ./out/MNLI/CoFi/MNLI_sparsity0.95
.