Congratulations on your paper!
I was wondering if this kind of approach can be applied more broadly.
Has the output model size been reduced in your implementation? If so, by how much?
Do you think this method could be used to further compress models like Phi or Meta's latest models?
Congratulations on your paper! I was wondering if this kind of approach can be applied more broadly.
Has the output model size been reduced in your implementation? If so, by how much? Do you think this method could be used to further compress models like Phi or Meta's latest models?
Thank you!