Open Messiz opened 2 years ago
I just found the information from the doc of pytorch(in the attaced picture). It shows that for a fc = nn.Linear(d_model, n_trg_vocab), actually the shape of fc's weight is (n_trg_vocab, d_model)!
Thank you for your answer, but I figured it out a few days after that by myself. Thanks anyway!😂
The code above want to realize weight share, but I'm confused that the embed layer and the linear layer have different shape of weight. How can this assignment work?