ryankiros / skip-thoughts

Sent2Vec encoder and training code from the paper "Skip-Thought Vectors"
2.05k stars 544 forks source link

Github code (decoder GRU layer) different from paper #8

Open FredericGodin opened 8 years ago

FredericGodin commented 8 years ago

Hi,

I'm stepping through the code and noticed that de GRU layer of the decoder doesn't take into account the context provided by the encoder. As stated in the paper, eq 5, 6 and 7, an extra parameter is added to incorporate the context every time step during decoding.

I think the code is missing the following function (taken from neural machine translation example): https://github.com/kyunghyuncho/dl4mt-material/blob/master/session1/nmt.py#L352

I can add it myself but I'm afraid that I will miss out something and therefore won't be able to get the same results. Especially, given that it should train for 2 weeks...

Best,

Fréderic

ryankiros commented 8 years ago

Hi Fréderic,

The pre-trained model on the main page was trained by conditioning on the encoder output at each time step (namely, it used the "gru_cond_simple_layer"). In the training code now, the encoder output directly initializes the decoder initial state (as in "sequence to sequence learning..." paper). I didn't notice any performance differences between the two, and it saves almost 200 lines of code.

https://github.com/ryankiros/skip-thoughts/blob/master/training/model.py#L86 where dec_ctx (the encoder output) is used to initialize the state. Sorry, perhaps I should make this clear somewhere in the readme.

Ryan