Hi @xiamengzhou.
When reproducing the efficiency evaluation of the [CoFi-MNLI-s95] model on a single NVIDIA A100 graphic card, it shows that the model's speed is 8.8e-05 seconds/example, where the vanilla fine-tuned BERT's speed is 4.6e-04 seconds/example, meaning that the speedup is only about 5.23× instead of 12.1×.
Could it be possible that the decrease in speedup comes from the difference in the hardware? Are there any other possible reasons that may cause the difference in efficiency testing? Many thanks!
Hi, sorry for the late reply! Yes, I think the number differs in different hardware. We tested on V100 instead of A100 at the time and it could be that A100 is more optimized for similarly shaped structures.
Hi @xiamengzhou. When reproducing the efficiency evaluation of the [CoFi-MNLI-s95] model on a single NVIDIA A100 graphic card, it shows that the model's speed is 8.8e-05 seconds/example, where the vanilla fine-tuned BERT's speed is 4.6e-04 seconds/example, meaning that the speedup is only about 5.23× instead of 12.1×. Could it be possible that the decrease in speedup comes from the difference in the hardware? Are there any other possible reasons that may cause the difference in efficiency testing? Many thanks!
The output for CoFi-MNLI-s95 testing:
The output for fine-tuned BERT testing: