Open FreyWang opened 5 years ago
It seems that the way to calculate attention weight is different from origin paper: softmax(v tanh(W[s,h])), relu are used after softmax here, can you give some reasons or reference?
` def forward(self, hidden, encoder_outputs): timestep = encoder_outputs.size(0) h = hidden.repeat(timestep, 1, 1).transpose(0, 1) encoder_outputs = encoder_outputs.transpose(0, 1) # [BTH] attn_energies = self.score(h, encoder_outputs) return F.relu(attn_energies).unsqueeze(1)
def score(self, hidden, encoder_outputs): # [B*T*2H]->[B*T*H] energy = F.softmax(self.attn(torch.cat([hidden, encoder_outputs], 2)), dim=2) energy = energy.transpose(1, 2) # [B*H*T] v = self.v.repeat(encoder_outputs.size(0), 1).unsqueeze(1) # [B*1*H] energy = torch.bmm(v, energy) # [B*1*T] return energy.squeeze(1) # [B*T]`
I am also confused about this ,if author come back,please notice me thank you
I am also confused about this
It seems that the way to calculate attention weight is different from origin paper: softmax(v tanh(W[s,h])), relu are used after softmax here, can you give some reasons or reference?
` def forward(self, hidden, encoder_outputs): timestep = encoder_outputs.size(0) h = hidden.repeat(timestep, 1, 1).transpose(0, 1) encoder_outputs = encoder_outputs.transpose(0, 1) # [BTH] attn_energies = self.score(h, encoder_outputs) return F.relu(attn_energies).unsqueeze(1)