pytorch MPI raw speed - Githubissues

mlbench / mlbench-old

!!!!!DEPRECATED!!!! distributed machine learning benchmark - a public benchmark of distributed ML solvers and frameworks

Apache License 2.0

40 stars 8 forks source link

pytorch MPI raw speed #73

Open martinjaggi opened 6 years ago

martinjaggi commented 6 years ago

we found this benchmark here: https://github.com/diux-dev/cluster/tree/master/pytorch_distributed_benchmark

will be interesting to have a look if we observe similar speed, and code is probably useful too. note that their benchmark is only raw communication all-reduce, no learning. this is relevant if one is communication bound. so we might likely see this scenario when training linear models soon

martinjaggi commented 6 years ago

BTW pytorch 1.0 has a new backend for distributed, called C10D which we should give a try. is used both in torch.distributed package and torch.nn.parallel.DistributedDataParallel