Closed gaohao95 closed 3 years ago
This PR improves all-to-all benchmark to have a series of runs with various sizes.
Sample output:
Size (MB): 1, Elasped time (s): 0.000176531, Bandwidth per GPU (GB/s): 16.9942 Size (MB): 2, Elasped time (s): 0.000267493, Bandwidth per GPU (GB/s): 22.4305 Size (MB): 4, Elasped time (s): 0.000302923, Bandwidth per GPU (GB/s): 39.614 Size (MB): 8, Elasped time (s): 0.000512558, Bandwidth per GPU (GB/s): 46.824 Size (MB): 16, Elasped time (s): 0.000820293, Bandwidth per GPU (GB/s): 58.5157 Size (MB): 32, Elasped time (s): 0.00148969, Bandwidth per GPU (GB/s): 64.4429 Size (MB): 64, Elasped time (s): 0.0027744, Bandwidth per GPU (GB/s): 69.2042 Size (MB): 128, Elasped time (s): 0.00543221, Bandwidth per GPU (GB/s): 70.6895 Size (MB): 256, Elasped time (s): 0.0106805, Bandwidth per GPU (GB/s): 71.9069 Size (MB): 512, Elasped time (s): 0.0212418, Bandwidth per GPU (GB/s): 72.3102 Size (MB): 1024, Elasped time (s): 0.0423483, Bandwidth per GPU (GB/s): 72.5413 Size (MB): 2048, Elasped time (s): 0.0845411, Bandwidth per GPU (GB/s): 72.6748 Size (MB): 4096, Elasped time (s): 0.168941, Bandwidth per GPU (GB/s): 72.7353
This PR is built on top of #49
This PR improves all-to-all benchmark to have a series of runs with various sizes.
Sample output:
This PR is built on top of #49