Open tschaffter opened 7 years ago
Please see http://homes.soic.indiana.edu/steflee/mpi-caffe.html for a full description of the cifar10-mpi example. The overview is this example replicates a model across each GPU and combines the output.
To split a single path model across multiple GPUs, you would use MPIBroadcast layers with communication groups containing only the source GPU (i.e. the one with the preceding layers assigned to it) and the next GPU (i.e. the one to receive the output). The MPIBroadcast output on the source GPU will need to be fed into a silence layer.
Hi, @steflee, after MPIBroadcast, the different processes are computing parallel ? Thanks.
The mpi-caffe CIFAR10 example doesn't seem to split the AlexNet model between multiple GPUs (I didn't looked in details at
examples/cifar10-mpi/cifar10_mpi_train_test.prototxt
). Below are the output of Caffe's training on the CIFAR10 example followed by the same training but used by mpi-caffe. When looking at the memory used by GPU 0, it seems that the entire model (~220 MB) is hosted on GPU 0 when using mpi-caffe. Can you provide a modified version ofexamples/cifar10-mpi/cifar10_mpi_train_test.prototxt
where the model is effectively split between three GPUs?and here is the output for mpi-caffe CIFAR10 example: