first I wanna say that it's so nice to see people sharing their thoughts and work like this.
I just wanted to ask, wrt. to the Keras distributed tests, are you scaling batch size with the number of gpus? (as Keras just splits the given batchsize, across the cards. so for a batchsize of 256 on 4 cards the real batchsize is 64 per card. (I honestly think this should be changed, but c'est la vie)
so this may be why you see less efficiency on the cards.
here's a plot from my tests, that shows quasilinear speedups on EC2 instances.
hey guys,
first I wanna say that it's so nice to see people sharing their thoughts and work like this.
I just wanted to ask, wrt. to the Keras distributed tests, are you scaling batch size with the number of gpus? (as Keras just splits the given batchsize, across the cards. so for a batchsize of 256 on 4 cards the real batchsize is 64 per card. (I honestly think this should be changed, but c'est la vie)
so this may be why you see less efficiency on the cards.
here's a plot from my tests, that shows quasilinear speedups on EC2 instances.
hope this helps!!