pmichel31415 / are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"
MIT License
165 stars 14 forks source link

Is BERT finetuned after pruning? #10

Closed Huan80805 closed 2 years ago

Huan80805 commented 2 years ago

Hi, I'm currently working on attention head pruning on models. I think in your reported experiments, you fine-tuned bert when training downstream MNLI task, right? But does it also work to fix the bert representation after pruning and train downstream MNLI task? I appreciate your answer

pmichel31415 commented 2 years ago

Hi @Huan80805 , sorry for the late response. In the paper, I pruned BERT after fine-tuning. However, it should be possible to first prune the pre-trained BERT and the fine-tune the pruned model. The main change would be that you'd be pruning based on the pre-training loss rather than the fine-tuning loss.

Huan80805 commented 2 years ago

Thanks for replying, I'm actually doing the exact thing (ie. pruning on the pretraining loss). Have a nice day.