Open glample opened 7 years ago
Guess you mean "Data Parallelization".
Like Line 72 examples/imagenet/main.py, please explicitly use:
model = torch.nn.DataParallel(model).cuda()
Once you use DataParallel(model) as model, you can run your command as
CUDA_VISIBLE_DEVICES=4,5,6,7 python main.py [options]
to use last 4 GPU out of total 8 cards.
Hello, I was wondering whether it would be possible to have a small example of code where a same network is cloned on different GPUs, with all clones sharing the same parameters.
For instance, I would like something where different subprocesses can train the model separately (like 8 subprocesses, each responsible for training a model on one GPU). The updates could then be accumulated to a common network, and all GPU network clones could synchronize their parameters to the ones of the common network periodically, or something like this.