Student model initialization

princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

MIT License

188 stars 32 forks source link

Student model initialization #21

Closed slawek-ib closed 1 year ago

slawek-ib commented 2 years ago

Hi, thanks for your great work on this project!

I'm curious why the student model starts from an untuned model rather than from the weights of the teacher? It would seem that reusing it could make the training faster. Is that something you've explored?

xiamengzhou commented 2 years ago

Hi,

That's a good point! We repeat the process by fine-tuning the untuned student model from scratch for 1 epoch before starting to prune it. Conceptually, this is similar to initializing a student model with a teacher model. We explored fine-tuning 1/2/3 epochs before pruning, and the results are similar.

xiamengzhou commented 1 year ago

Hi, I am closing this issue. Feel free to reopen it if you have more questions :)