openai / generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
https://arxiv.org/abs/1704.01444
MIT License
1.51k stars 380 forks source link

hyperparameters #28

Closed yairf11 closed 7 years ago

yairf11 commented 7 years ago

Hi, I'm a little confused with the hyper-parameters of the model. In the paper, it is stated that:

The model was trained for a single epoch on mini-batches of 128 subsequences of length 256 for a total of 1 million weight updates.

But, If I understand correctly, the hyperparameter 'nsteps' represents the length of each subsuence, and it is set to be 64, and not 256. Why is that? Am I understanding the meaning of 'nsteps' correctly? Also, I couldn't figure out what the 'nembd' and 'nstates' hyperparameters stand for. If someone can clear things up for me, it would be great.

Thanks!

Newmu commented 7 years ago

Hi yair,

The setting nsteps is only used here for test-time feature extraction - it's how many steps are processed on the GPU for each sess.run call. This can be set to any length and has no impact on the computed result. Since this implementation does padding to that length - smaller is more efficient to reduce overall padding, but if you go too small then you lose efficiency due too many sess.run calls and the overhead of using feed dicts. Just picked 64 as a good middle ground.

Nstates let's it know that the lstm has two separate state vectors (cell and hidden). Nembd is the dimensionality of the embedding layer (64) used to represent the input bytes.