An issue when reproducing the efficiency evaluation

princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

MIT License

188 stars 32 forks source link

Hi @xiamengzhou. When reproducing the efficiency evaluation of the [CoFi-MNLI-s95] model on a single NVIDIA A100 graphic card, it shows that the model's speed is 8.8e-05 seconds/example, where the vanilla fine-tuned BERT's speed is 4.6e-04 seconds/example, meaning that the speedup is only about 5.23× instead of 12.1×. Could it be possible that the decrease in speedup comes from the difference in the hardware? Are there any other possible reasons that may cause the difference in efficiency testing? Many thanks!

The output for CoFi-MNLI-s95 testing:

The output for fine-tuned BERT testing:

princeton-nlp / CoFiPruning

An issue when reproducing the efficiency evaluation #39