Open Asichurter opened 3 years ago
Hi! Thank you for your issue! I didn't found this in original code and in paper, when I worked on this repo, but authors mentioned this: "We divide each program into segments consisting of 50 consecutive AST nodes, with the last segment being padded with EOF if it is not full. The LSTM hidden state and mem- ory state are initialized with h0, c0, which are two trainable vectors. The last hidden and memory states from the previ- ous LSTM segment are fed into the next one as initial states if both segments belong to the same program. Otherwise, the hidden and memory states are reset to h0, c0."
In my implementation h0 and c0 are always just set to default values (ones as I remember).
I'm not pretty sure if this helps to improve performance, but you can try to fix this issue. You need to pay attention on data preparation and training code.
It would be great if you make a pull request with fix.
In the original paper, it is said that each program is divided into many short segments (len=50) to feed into model respectively and they are interacted by reusing the hidden states and memory states from previous segment as initial states. However, in this implementation, two points do not fit this description:
This is only my own question for this implementation. Thanks for any explainations or replies.