Pruning for Encoder-Decoder Architecture?

princeton-nlp / CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

MIT License

188 stars 32 forks source link

Pruning for Encoder-Decoder Architecture? #19

Closed Luckick closed 2 years ago

Luckick commented 2 years ago

Hi, Does CoFiPruning work on Encoder-Decoder Architectures for Seq2seq tasks such as translation? Thanks!

xiamengzhou commented 2 years ago

Hi,

Thanks for your interest in our repository. Unfortunately, the current version does not support seq2seq models yet. But technically the approach could be adapted to seq2seq models. I'd be happy to help if you have any issues implementing it!

Luckick commented 2 years ago

Does the pruning technique work only for encoder layers or for both encoder and decoder layers?

xiamengzhou commented 2 years ago

It should work for both but would require placing the masking variables on the decoder side and cross-attentions properly.

Luckick commented 2 years ago

Thank you for the clarification and help!