Tips for training on GPU?

jonny-d commented 6 years ago

Hello,

I am trying to train this model in Tensorflow using the values for batch size and sequence length given in the paper (batches of 128 and sequence length of 256) though I am struggling to implement the model with these hyper-parameters. I am able to train the model with the same hidden size as reported in the paper (hidden-size of 4096), but only with smaller batch and sequence length settings. As I increase the values of these hyper parameters I encounter OOM memory errors. Debugging the causes of these errors is tricky. I am currently looking into using tfdbg and also tfprof. My model crashes during the session.run() call to my optimizer op.

Could you share any details of how you implemented this model? or give any recommendations for creating efficient implementations (e.g creating efficient input pipelines, device placement in TF graph, common pitfalls, debugging tips)

I am using a Google Cloud Platform Compute Instance for my implementation.

Any tips or tricks to help with implementing this would be greatly appreciated!

Thanks, Jonny

eggie5 commented 6 years ago

Are you aware that it took them 1 month to train the model?

jonny-d commented 6 years ago

Yes

openai / generating-reviews-discovering-sentiment

Tips for training on GPU? #45