mhjabreel / CharCNN

MIT License
234 stars 97 forks source link

Performance on GPUs #3

Closed yihong-chen closed 8 years ago

yihong-chen commented 8 years ago

Hi, I am trying to run CharCNN on 4 GeForce GTX 1080.I am struggling with 2 problems.

  1. The 4 GPUs seem to have unbalanced load.When I ran nvidia-smi,the result is shown as following image what can I do to make full use of all GPUs?
  2. I have been running CharCNN for several days.But the accuracy remains around 0.3 without any advancement.I just ran python training.py without any change to your code.The current status is shown as following: image Do I have to change some parameters or the type of optimizer? Have you ever tried to run CharCNN with better performance?

Thank you very much!

yihong-chen commented 8 years ago

I found that the problem can be solved by changing the optimizer from Adamoptimizer to Momentum optimizer.

mhjabreel commented 8 years ago

Hi @LaceyChen17 ,

Thank you for your comments, actually I have a PC with only one GPU "NVIDIA Geforce 720M", so I was not able to try the code with multiple GPUs. About the convergence I just tried to run the code with a few number of epochs "as my GPU device is old and its capabilities are limited", but you are right I realized that using the Momentum optimizer can solve that issue.

Thank you very much

zhang-jinyi commented 7 years ago

Hi @LaceyChen17 @mhjabreel

I changed the optimizer from Adamoptimizer to Momentum optimizer. Just the code below: optimizer = tf.train.MomentumOptimizer(learning_rate, config.training.momentum) and of course, momentum = 0.9 in the config.py

Finally,I run the python training.py for about 3000 steps. But the accuracy remains under 0.30 without any advancement.

Could you tell me more details of the way to deal with that?

yihong-chen commented 7 years ago

maybe you just need to run more steps. I ran it on 2 nvidia Tesla M40s and the accuracy remained under 0.4 for almost 4 hours,The training accuracy vs time curve is shown as following image

zhang-jinyi commented 7 years ago

@LaceyChen17 I‘ll try and thank you for your patience.

yihong-chen commented 7 years ago

@renzhe0009 You are welcome ^_^

jeremied3 commented 7 years ago

only after 33000 steps (~2days) did I see the validation accuracy climb from ~30% to ~70% and eventually 88% eventually.

ydzhang12345 commented 7 years ago

simply change the base rate to 1e-3 and use adam, you will see accuracy climb to 80% in less than 1000 steps

ayrtondenner commented 6 years ago

I just tried changing the base rate and kept Adam Optimizer, but I didn't get best results, my neural network couldn't pass 30% of accuracy in test data.

2018-06-25T17:01:41.139571: step 56010, loss 1.38633, acc 0.226562
2018-06-25T17:01:43.331433: step 56020, loss 1.38645, acc 0.179688
2018-06-25T17:01:45.562337: step 56030, loss 1.38628, acc 0.28125
2018-06-25T17:01:47.747149: step 56040, loss 1.38629, acc 0.265625
2018-06-25T17:01:49.989139: step 56050, loss 1.3863, acc 0.226562
2018-06-25T17:01:52.222080: step 56060, loss 1.38624, acc 0.289062
2018-06-25T17:01:54.453016: step 56070, loss 1.38631, acc 0.265625
2018-06-25T17:01:56.656879: step 56080, loss 1.3863, acc 0.289062
2018-06-25T17:01:58.912880: step 56090, loss 1.38638, acc 0.171875
2018-06-25T17:02:01.111723: step 56100, loss 1.38629, acc 0.273438

Evaluation:
2018-06-25T17:02:01.192915: step 56100, loss 1.38627, acc 0.226562