Hi,
I got pretrained-Bert by modifying script/run_FT.sh, which results in a decent 84.3% ACC on MNLI.
Using this pretrained model as a teacher, I ran run.sh where the only change is the path for the teacher model. And, the result is above 85% under 95% sparsity. Does this result make sense? Otherwise, did I make some mistakes?
Hi, I got pretrained-Bert by modifying script/run_FT.sh, which results in a decent 84.3% ACC on MNLI. Using this pretrained model as a teacher, I ran run.sh where the only change is the path for the teacher model. And, the result is above 85% under 95% sparsity. Does this result make sense? Otherwise, did I make some mistakes?