Open RitaRamo opened 4 years ago
s_t
Yeah,i think you are right, do you get better result?
Hi,
Thank you for your code!
I have noticed that you calculated s_t with the last memory cell state (c_t-1), when the paper suggests the current c (the memory cell of the "current lstm"). It seems to me that g_t should use the last hidden state (t-1), as you did, but the s_t needs the current memory cell state (t). Hence, I suggest the following modification:
g_t = self.sigmoid(self.affine_embed(self.dropout(embeddings[:batch_size_t, t, :])) + self.affine_decoder(self.dropout(h[:batch_size_t]))) # (batch_size_t, decoder_dim) #remove s_t from here h, c = self.decode_step_adaptive( torch.cat([embeddings[:batch_size_t, t, :], v_g[:batch_size_t, :]], dim=1), (h[:batch_size_t], c[:batch_size_t])) # (batch_size_t, decoder_dim) #add it here s_t = g_t * torch.tanh(c) # (batch_size_t, decoder_dim)
I changed the code but got a worse result, i do not know if we were right.
Hi,
Thank you for your code!
I have noticed that you calculated s_t with the last memory cell state (c_t-1), when the paper suggests the current c (the memory cell of the "current lstm"). It seems to me that g_t should use the last hidden state (t-1), as you did, but the s_t needs the current memory cell state (t). Hence, I suggest the following modification: