Closed FredericMao closed 7 years ago
@powerreactor It looks wrong. But if you notice the iteration index. It's the time every 80 iterations. There's a problem about the indexing. But the timing is correct if you divide it by 2.
@powerreactor Did you fix the segfault? How can you run 16GPUs?
I still don't know what causes the segfault. Sometimes it works, but sometimes not.
Please try changing the compiling folder to somewhere under you /home. (actually Theano's default is ~/.theano)
@powerreactor Changed. https://github.com/uoguelph-mlrg/Theano-MPI/commit/526639c8e96026e2fd22bf4291b9f5fca7332f48
Hi He,
I am testing 16 GPUs on Mosaic, the timing I got:
29520 5.315070 0.925000 time per 5120 images: 4.89 (train 3.90 comm 0.87 wait 0.12)
29600 5.449820 0.932813 time per 5120 images: 4.88 (train 3.89 comm 0.88 wait 0.11)
29680 5.360006 0.945312 time per 5120 images: 4.91 (train 3.90 comm 0.89 wait 0.12)
Comm time seems correct, but training is the same as 8 GPU.