Closed bryanyuan1 closed 3 years ago
hi, unfortunately the library by default does not support training a single agent on multiple GPUs. what i have found helpful is to use multiple GPUs to run the agent on different seeds/games in parallel.
alternatively, if you're still in algorithm development phase (and still exploring ideas), consider doing so in smaller environments like classic control or MinAtar, as argued in: https://arxiv.org/abs/2011.14826
On Sat, Jun 26, 2021 at 9:36 PM Bryan Yuan @.***> wrote:
Hi! I am currently running the plain DQNAgent and hope to reproduce the baseline result. However, each iteration takes 30-40 minutes to finish. I am running the agent on AWS Sagemaker Studio. Is it possible to speed up training by using multiple gpus?
I think the agent uses only gpu:0 by default, because no matter how many gpus I have for my AWS notebook instance, the training speed is roughly the same.
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMPKDGI2K3N4S7YPNN3TUZ6ALANCNFSM47L4LNPA .
Thank you so much Pablo!
no problem! if you're interested in running minatar with dopamine, you can use this file as reference: https://github.com/google-research/google-research/blob/master/tandem_dqn/minatar_env.py
with accompanying gin files: https://github.com/google-research/google-research/tree/master/tandem_dqn/configs
On Mon, Jun 28, 2021 at 8:07 AM Bryan Yuan @.***> wrote:
Thank you so much Pablo!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/179#issuecomment-869628381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMM3L6UNS7ZOSE2X4K3TVBQYFANCNFSM47L4LNPA .
Thanks will try that. Just curious, how much time did the baseline of DQN take to train for 200 iterations? How many timesteps per second did you have when training?
on a p100 training DQN for 200 iterations took roughly 5 days.
On Mon, Jun 28, 2021 at 9:08 PM Bryan Yuan @.***> wrote:
Thanks will try that. Just curious, how much time did the baseline of DQN take to train for 200 iterations? How many timesteps per second did you have when training?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/179#issuecomment-870151188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMLYUQZGANWRL74BFMLTVEMJFANCNFSM47L4LNPA .
Hi! I am currently running the plain DQNAgent and hope to reproduce the baseline result. However, each iteration takes 30-40 minutes to finish. I am running the agent on AWS Sagemaker Studio. Is it possible to speed up training by using multiple gpus?
I think the agent uses only gpu:0 by default, because no matter how many gpus I have for my AWS notebook instance, the training speed is roughly the same.
Thanks!