Closed iteachcs closed 6 years ago
you can train it with as many as GPUs in a single machine. If you use --gpu 0 --use_data_parallel
then Pytorch will use all GPUs in your machine.
Got it. Thanks, Dr. Tian!
@yuandong-tian In this announcement https://facebook.ai/developers/tools/elf It is said that ELF OpenGo was trained by 2000 GPUs. My question is 2000 GPUs were used for training only? Or also used for selfplaying and evaluating? And if 2000 GPUs were on a single machine or on multiple machines? If multi machines then we can setup training on multi servers using this source code?
Thank you!
Thanks for releasing Open Go! I was just wondering if the server could support training of a model with multiple GPUs. It appears from start_server.sh that 8 threads are supported but there is only one gpu specified in the command line options.