msr-fiddle / pipedream

MIT License
379 stars 117 forks source link

Bandwidth within the machine #21

Closed ADAM-CT closed 4 years ago

ADAM-CT commented 4 years ago

optimizer_graph_hierarchical.py The script's parameter (--network_bandwidth) is bandwidth within the machine. What is bandwidth considered between machines?

deepakn94 commented 4 years ago

--network_bandwidth is actually a list: so you want to pass in both inter- and intra- server bandwidth using it.

ADAM-CT commented 4 years ago

Thanks for your reply. I noticed that you defined the elements in this list as levels, level 1 is the bandwidth within the node, level 2 is the bandwidth between the nodes, so what does level 3 or more mean?

deepakn94 commented 4 years ago

You could imagine a topology where the bandwidth between some GPUs in a given server is higher than between others (think of two sets of 4 GPUs on a 8-GPU server).

Two levels are probably sufficient to model most commonly used topologies.

ADAM-CT commented 4 years ago

As you said if I have two servers, each of which has 8 gpus, the bandwidth between the first server's 4 gpus is B1, and the other 4 gpus' bandwidth is B2;The bandwidth between the four gpus on the other server is B3, and the other four are B4.The bandwidth between the two servers is B5.- bandwidth is?

deepakn94 commented 4 years ago

In your example, we would assume that B1 = B2 [the bandwidth for the first level]. Then we assume that each group of 4 is connected by some bandwidth [the bandwidth for the second level]. Finally, the bandwidth between servers is B5 in your example [the bandwidth for the third level].

ADAM-CT commented 4 years ago

Thank you very much for your reply.I want to make sure that I understand correctly. eg1: We assume that B1 = B2,B3 = B4: --bandwidth B1,B3, B5; eg2: we asssume that B1=B2=B3=B4: --bandwidth B1 B5

deepakn94 commented 4 years ago

If you have a topology where GPUs within a group of 4 are connected with higher bandwidth than between groups of 4, then you want --bandwidth B1 B3 B5. If all 8 GPUs within a server are connected with the same bandwidth, you want --bandwidth B1 B5.

ADAM-CT commented 4 years ago

Thank you. My problem has been solved

deepakn94 commented 4 years ago

Great! Going to close this!