Core codes for the sandwich weight sharing

machelreid / subformer

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

https://arxiv.org/abs/2101.00234

MIT License

14 stars 3 forks source link

Core codes for the sandwich weight sharing #1

Closed qianlou closed 3 years ago

qianlou commented 3 years ago

Dear Subformer authors,

Thanks for sharing your codes on the interesting subformer work! I am eager to reproduce your experiments on sandwich weight sharing. But I am a little confused about finding your core codes from the fairseq templete. Is it mainly in the fairseq/modules/subformer_layer.py? Could you kindly introduce your core codes on weight sharing? Thanks very much!

Bests, Qian

machelreid commented 3 years ago

Hi Qian!

The sandwich-style parameter sharing is implemented in fairseq/models/transformer.py.

Let me know if that helps!

Best, Machel

qianlou commented 3 years ago

Thanks for your feedback. Very helpful!