Closed CaffreyR closed 1 year ago
Hi,
Thanks for reaching out!
For your first question, multiplying the representations with diag(z_int) essentially multiplies the output dimension of the representations with the corresponding mask. We use diag(z_int) as a matrix notation.
For your second question, yes! CoFi pruning prunes a student model with a distillation objective.
Feel free to reach out again if you have more questions :)
Hi, so diag
is a diagonal matrix with the zint
on its diagonal line?
Yes!
Thanks! So why it is have to be a diagonal matrix? Can a non-diagonal matrix replace it as long as the non-diagonal matrix represents the corresponding mask?
Yes, it can! We use diag in our paper for mathematical correctness.
Hi, I am closing this issue now! Feel free to reopen it if you have more questions :)
Hi @xiamengzhou , many thanks to your contribution. I have small questions in your paper, in your paper you said that
And in your paper there is a Eq, but what is
diag
, why do we have to putZint
into a diagonal matrix? Dodiag(Zint)
isdf*df
size?And you also says that
So are we pruning a student model during distillation?
Many thanks!!