Performance Update - Githubissues

jcoreyes commented 8 years ago

Keep replay memory (screens, pre and post states) in gpu memory
Use transpose kernel to switch to chwn format
Training steps per second on breakout rom at 455 up from 260

tambetm commented 8 years ago

Thanks for a nice pull request, together those changes result in almost 2x improvement!

But I would like to keep the code runnable on lesser GPUs as well, therefore I would like to have two ReplayMemory implementations that you can choose using command line switch. Also I would like to keep main code independent of Neon, therefore we need to figure out how to share backend between ReplayMemory and DeepQNetwork without instantiating it in main. Or can we just use two separate backends?

Also I understood, that current version is achieving 38% GPU utilization on Titan X. I wonder what could be done to achieve 100%? Some ideas:

implement Q-updates in DeepQNetwork also using Neon backend,
run playing and training parallel in separate threads.

mw66 commented 8 years ago

Does this fork really keep replay memory in GPU?

I tried the latest version, but my GPU usage:

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2963 C python 112MiB | +-----------------------------------------------------------------------------+

And main memory:

PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20 0 44.118g 6.766g 110244 R 100.0 43.2 534:05.11 python

6.766g is about the size that 1M replay memory in main memory.

tambetm / simple_dqn

Performance Update #8