@xrc10
In the paper, src-tgt attention on sentences is after the src-tgt attention on tokens. However, in the code, the order is opposite.
At line 1000 in MeetingNet_Transformer.py,
def forward(self, y, token_enc_key, token_enc_value, sent_enc_key, sent_enc_value):
query, key, value = self.decoder_splitter(y)
batch x len x n_state
# self-attention
a = self.attn(query, key, value, None, one_dir_visible=True)
# batch x len x n_state
n = self.ln_1(y + a) # residual
if 'NO_HIERARCHY' in self.opt:
q = y
r = n
else:
# src-tgt attention on sentences
q = self.sent_attn(n, sent_enc_key, sent_enc_value, None)
r = self.ln_3(n + q) # residual
# batch x len x n_state
# src-tgt attention on tokens
o = self.token_attn(r, token_enc_key, token_enc_value, None)
p = self.ln_2(r + o) # residual
# batch x len x n_state
m = self.mlp(p)
h = self.ln_4(p + m)
return
I would like to confirm Is this intended code or not?
@xrc10 In the paper, src-tgt attention on sentences is after the src-tgt attention on tokens. However, in the code, the order is opposite. At line 1000 in MeetingNet_Transformer.py,
def forward(self, y, token_enc_key, token_enc_value, sent_enc_key, sent_enc_value): query, key, value = self.decoder_splitter(y)
batch x len x n_state
I would like to confirm Is this intended code or not?