openai / generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
https://arxiv.org/abs/1704.01444
MIT License
1.51k stars 379 forks source link

Initial weights and learning rate decay #31

Closed yairf11 closed 7 years ago

yairf11 commented 7 years ago

Hi,

I am looking for some help with understanding the model details.

First of all, I couldn't find any mention of the model's initial weights (including the initial embedding). Also, it is stated that:

an initial 5e-4 learning rate that was decayed linearly to zero over the course of training.

But I couldn't find what was the decay function exactly.

Any help would be highly appreciated.

Thanks!

Newmu commented 7 years ago

Recurrent weights are initialized as random orthogonal matrices and weightnorm is used. Embedding and output weights were inited using normal 0.02 but no weightnorm was used. It isn't particularly sensitive to these params, compared to the recurrent ones. We trained a second model using orthogonal+weightnorm on the input embedding and noticed no significant changes, for instance.

A linear decay function to zero can be implemented as learning_rate*(1-(cur_update/total_updates))