pmichel31415 are-16-heads-really-better-than-1 issues - Githubissues

pmichel31415 / are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

MIT License

163 stars 14 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?

#11 Hritikbansal closed 2 years ago
1
Is BERT finetuned after pruning?

#10 Huan80805 closed 2 years ago
2
Is the code still able to run?

#9 bing0037 opened 2 years ago
3
about the params: --raw-text and --transformer-mask-heads

#8 LiangQiqi677 opened 3 years ago
0
RuntimeError: can't retain_grad on Tensor that has requires_grad=False

#7 YJiangcm opened 3 years ago
2
Systematic Pruning Experiments Problem

#6 ChuanyangZheng closed 3 years ago
0
a question about run_classifier.py

#5 Ixuanzhang closed 4 years ago
1
Not able to obtain pretrained WMT model

#4 marwash25 closed 4 years ago
2
BERT actually_prune option not working

#3 pglock closed 5 years ago
1
Not able to prune the BERT model

#2 ishita1995 closed 4 years ago
7
No code on master?

#1 aninrusimha closed 5 years ago
1