issues
search
pmichel31415
/
are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
MIT License
163
stars
14
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?
#11
Hritikbansal
closed
2 years ago
1
Is BERT finetuned after pruning?
#10
Huan80805
closed
2 years ago
2
Is the code still able to run?
#9
bing0037
opened
2 years ago
3
about the params: --raw-text and --transformer-mask-heads
#8
LiangQiqi677
opened
3 years ago
0
RuntimeError: can't retain_grad on Tensor that has requires_grad=False
#7
YJiangcm
opened
3 years ago
2
Systematic Pruning Experiments Problem
#6
ChuanyangZheng
closed
3 years ago
0
a question about run_classifier.py
#5
Ixuanzhang
closed
4 years ago
1
Not able to obtain pretrained WMT model
#4
marwash25
closed
4 years ago
2
BERT actually_prune option not working
#3
pglock
closed
5 years ago
1
Not able to prune the BERT model
#2
ishita1995
closed
4 years ago
7
No code on master?
#1
aninrusimha
closed
5 years ago
1