pytorch / ELF

ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
Other
3.37k stars 566 forks source link

Multi-GPU for training on the server side? #35

Closed iteachcs closed 6 years ago

iteachcs commented 6 years ago

Thanks for releasing Open Go! I was just wondering if the server could support training of a model with multiple GPUs. It appears from start_server.sh that 8 threads are supported but there is only one gpu specified in the command line options.

yuandong-tian commented 6 years ago

you can train it with as many as GPUs in a single machine. If you use --gpu 0 --use_data_parallel then Pytorch will use all GPUs in your machine.

iteachcs commented 6 years ago

Got it. Thanks, Dr. Tian!

drsagitn commented 6 years ago

@yuandong-tian In this announcement https://facebook.ai/developers/tools/elf It is said that ELF OpenGo was trained by 2000 GPUs. My question is 2000 GPUs were used for training only? Or also used for selfplaying and evaluating? And if 2000 GPUs were on a single machine or on multiple machines? If multi machines then we can setup training on multi servers using this source code?

Thank you!