The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo
Thanks for sharing your codes on the interesting subformer work! I am eager to reproduce your experiments on sandwich weight sharing. But I am a little confused about finding your core codes from the fairseq templete. Is it mainly in the fairseq/modules/subformer_layer.py? Could you kindly introduce your core codes on weight sharing? Thanks very much!
Dear Subformer authors,
Thanks for sharing your codes on the interesting subformer work! I am eager to reproduce your experiments on sandwich weight sharing. But I am a little confused about finding your core codes from the fairseq templete. Is it mainly in the fairseq/modules/subformer_layer.py? Could you kindly introduce your core codes on weight sharing? Thanks very much!
Bests, Qian