multi gpu training with different subprocesses

pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

https://pytorch.org/examples

BSD 3-Clause "New" or "Revised" License

22.24k stars 9.52k forks source link

multi gpu training with different subprocesses #13

Open glample opened 7 years ago

glample commented 7 years ago

Hello, I was wondering whether it would be possible to have a small example of code where a same network is cloned on different GPUs, with all clones sharing the same parameters.

For instance, I would like something where different subprocesses can train the model separately (like 8 subprocesses, each responsible for training a model on one GPU). The updates could then be accumulated to a common network, and all GPU network clones could synchronize their parameters to the ones of the common network periodically, or something like this.

catalystfrank commented 7 years ago

Guess you mean "Data Parallelization".

Like Line 72 examples/imagenet/main.py, please explicitly use:

model = torch.nn.DataParallel(model).cuda()

Once you use DataParallel(model) as model, you can run your command as

CUDA_VISIBLE_DEVICES=4,5,6,7 python main.py [options]

to use last 4 GPU out of total 8 cards.