microsoft / pai

Resource scheduling and cluster management for AI
https://openpai.readthedocs.io
MIT License
2.63k stars 548 forks source link

Does openpai need infiniband network card when used on distributed training? #4814

Open yingunjun opened 4 years ago

yingunjun commented 4 years ago

Does openpai need infiniband network card when used on distributed training?

fanyangCS commented 4 years ago

It does not require IB. but if the cluster has IB devices, OpenPAI supports IB.

debuggy commented 3 years ago

Will add a manual about how to use IB in distributed training

abuccts commented 3 years ago

pls refer to https://openpai.readthedocs.io/en/latest/manual/cluster-user/how-to-use-advanced-job-settings.html#infiniband-jobs for example ib job