Closed Huan80805 closed 2 years ago
Hi @Huan80805 , sorry for the late response. In the paper, I pruned BERT after fine-tuning. However, it should be possible to first prune the pre-trained BERT and the fine-tune the pruned model. The main change would be that you'd be pruning based on the pre-training loss rather than the fine-tuning loss.
Thanks for replying, I'm actually doing the exact thing (ie. pruning on the pretraining loss). Have a nice day.
Hi, I'm currently working on attention head pruning on models. I think in your reported experiments, you fine-tuned bert when training downstream MNLI task, right? But does it also work to fix the bert representation after pruning and train downstream MNLI task? I appreciate your answer