s1879281 / Image-Captioning-with-Adaptive-Attention

PyTorch implementation of image captioning with adaptive attention mechanism.
16 stars 7 forks source link

s_t #1

Open RitaRamo opened 4 years ago

RitaRamo commented 4 years ago

Hi,

Thank you for your code!

I have noticed that you calculated s_t with the last memory cell state (c_t-1), when the paper suggests the current c (the memory cell of the "current lstm"). It seems to me that g_t should use the last hidden state (t-1), as you did, but the s_t needs the current memory cell state (t). Hence, I suggest the following modification:

g_t = self.sigmoid(self.affine_embed(self.dropout(embeddings[:batch_size_t, t, :]))
                                   + self.affine_decoder(self.dropout(h[:batch_size_t])))    # (batch_size_t, decoder_dim)

#remove s_t from here

h, c = self.decode_step_adaptive(
                    torch.cat([embeddings[:batch_size_t, t, :], v_g[:batch_size_t, :]], dim=1),
                               (h[:batch_size_t], c[:batch_size_t]))  # (batch_size_t, decoder_dim)

#add it here
s_t = g_t * torch.tanh(c)   # (batch_size_t, decoder_dim)
PanFei748 commented 3 years ago

s_t

Yeah,i think you are right, do you get better result?

PanFei748 commented 3 years ago

Hi,

Thank you for your code!

I have noticed that you calculated s_t with the last memory cell state (c_t-1), when the paper suggests the current c (the memory cell of the "current lstm"). It seems to me that g_t should use the last hidden state (t-1), as you did, but the s_t needs the current memory cell state (t). Hence, I suggest the following modification:

g_t = self.sigmoid(self.affine_embed(self.dropout(embeddings[:batch_size_t, t, :]))
                                   + self.affine_decoder(self.dropout(h[:batch_size_t])))    # (batch_size_t, decoder_dim)

#remove s_t from here

h, c = self.decode_step_adaptive(
                    torch.cat([embeddings[:batch_size_t, t, :], v_g[:batch_size_t, :]], dim=1),
                               (h[:batch_size_t], c[:batch_size_t]))  # (batch_size_t, decoder_dim)

#add it here
s_t = g_t * torch.tanh(c)   # (batch_size_t, decoder_dim)

I changed the code but got a worse result, i do not know if we were right.