yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 357 forks source link

InfiniBand #278

Open shenjingGitHub opened 7 years ago

shenjingGitHub commented 7 years ago

hi everyone!When i use two Inifiniband cards,my CaffeOnSpark cannot work on rdma mode.Has anyone met the mistakes? thank you!

junshi15 commented 7 years ago

You may get more specific help if you can elaborate your issues. If you have more than one infiniband card in a single box, one the first one will be used, due to this line. This can be easily modified, though.

shenjingGitHub commented 7 years ago

thank you!The problem has been solved!But i'm confused by another question. It costs nearly 580ms transferring 1Gb data by 2 mellanox connectx3's IB cards when i test on rdma.cpp.However the official says 40Gb/s can be reached.I wonder if there is any problem between the cards' connection. Wish to get your answer.thank u~