mlcommons / training_results_v0.5

This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.
https://mlcommons.org/en/training-normal-05/
Apache License 2.0
35 stars 54 forks source link

Muti-node training on cloud instances #12

Open amnash opened 5 years ago

amnash commented 5 years ago

What would be the best way to run multi-node training on cloud compute instances? Similar to multi-node DGX1/DGX2 training using slurm?