ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
659 stars 195 forks source link

Training using GPUs #8

Closed migueljette closed 7 years ago

migueljette commented 7 years ago

Hi there,

Is there a way to train using GPUs? And what is "normal" speed when training the example "ep" model?

On my macbook pro:

python main.py ep 256 0.02
256 0.02 Model_ep_h256_lr0.02.pcl
Building model...
Number of parameters is 17912840
Training...
PPL: 1.7275; Speed: 281.17 sps
PPL: 1.5956; Speed: 301.83 sps
PPL: 1.4879; Speed: 304.04 sps
PPL: 1.4188; Speed: 306.10 sps
PPL: 1.3774; Speed: 306.06 sps
PPL: 1.3452; Speed: 307.42 sps
PPL: 1.3209; Speed: 309.06 sps
PPL: 1.3014; Speed: 310.14 sps
PPL: 1.2860; Speed: 310.48 sps
PPL: 1.2729; Speed: 307.93 sps
PPL: 1.2619; Speed: 306.15 sps
PPL: 1.2526; Speed: 307.00 sps

On an AWS instance with GPU:

THEANO_FLAGS=device=gpu python main.py ep 256 0.02
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py:556: UserWarning: Theano flag device=gpu* (old gpu back-end) only support floatX=float32. You have floatX=float64. Use the new gpu back-end with device=cuda* for that value of floatX.
  warnings.warn(msg)
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN not available)
256 0.02 Model_ep_h256_lr0.02.pcl
Building model...
Number of parameters is 17912840
Training...
PPL: 1.7275; Speed: 250.31 sps

Thanks for your help! Looking forward to training with my own data, but I want to make sure everything is working as expected.

Cheers Miguel

ottokart commented 7 years ago

Hi!

yes, training on GPU-s is not only possible, but strongly recommended. Training speed on AWS with a Tesla K80 GPU should be around 10000 sps.

You will probably see a speedup when you set floatX in ~/.theanorc to float32. Example ~/.theanorc:

[global]
floatX = float32

Another speedup will probably come when you switch to libgpuarray backend as hinted in the warning: https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Best, Ottokar

migueljette commented 7 years ago

Hi!

It worked. I had to do a few things... my installation was not perfect. But now it works:

THEANO_FLAGS=device=cuda0 python main.py ep 256 0.02
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: Tesla K80 (0000:00:1E.0)
256 0.02 Model_ep_h256_lr0.02.pcl
Building model...
Number of parameters is 17912840
Training...
PPL: 1.7279; Speed: 2615.79 sps
PPL: 1.5797; Speed: 4379.89 sps
PPL: 1.4646; Speed: 5649.73 sps
PPL: 1.3994; Speed: 6607.25 sps
PPL: 1.3577; Speed: 7354.74 sps
PPL: 1.3279; Speed: 7954.68 sps
PPL: 1.3055; Speed: 8446.64 sps
PPL: 1.2877; Speed: 8857.51 sps
PPL: 1.2734; Speed: 9200.41 sps
PPL: 1.2614; Speed: 9499.25 sps
PPL: 1.2513; Speed: 9758.59 sps
PPL: 1.2426; Speed: 9985.76 sps
PPL: 1.2350; Speed: 10186.37 sps

Thanks for your help! I look forward to testing this on my data!

Cheers Miguel