intel / mpi-benchmarks

146 stars 63 forks source link

MPI Sendrecv and Exchange performance #37

Closed tetsushinto closed 2 years ago

tetsushinto commented 2 years ago

Hello,

Regarding the performance of Sendrecv and Exchange of MPI, the performance of Mbytes/sec drops from 1048576 bytes. Can you tell me why this occurs?

The results with MOFED 5.4-1.0.3.0 are shown here, Similar results were seen with MOFED5.2-1.0.4.0.

When testing with #osu_mbw_mr which is similar with IMB, there is no such issue, so it seems the issue caused by IMB itself.

■ MPI result (excerpt from log file)

Sendrecv:

bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec

262144 160 30.48 33.48 32.24 15660.37
524288 80 46.49 59.52 54.79 17616.61
1048576 40 164.35 258.68 217.49 8107.16 ★←Drops from here
2097152 20 824.89 936.35 893.57 4479.40
4194304 10 2010.12 3076.62 2165.94 2726.56

Exchange:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec

262144 160 82.66 92.93 87.85 11283.53 524288 80 149.78 169.20 157.50 12394.29 1048576 40 581.32 797.35 711.64 5260.33 ★←drops from here 2097152 20 1880.77 2131.25 1996.31 3936.01 4194304 10 4011.95 4440.56 4226.54 3778.18

■Setup:  ・OFED:MLNX_OFED_LINUX-5.4-1.0.3.0  ・HCA: Nvidia ConnectX-6 HDR100 (FW: 20.31.1014)  ・IBSW:QM8700 (MLNX-OS:3.9.2400)

Best regards, Shinto

VinnitskiV commented 2 years ago

Hi @tetsushinto Could you please share more information:

  1. Which version of IMB do you use? (Check it IMB-MPI1 -help)
  2. Your running options
  3. Same log from osu
  4. MPI version
tetsushinto commented 2 years ago

Hi @VinnitskiV Thanks for your response. I am preparing the information you ask. Will provide once I get it.

tetsushinto commented 2 years ago

Hi @VinnitskiV

I attached the running logs for both IMB and OSU and the output of "IMB-MPI1 -help" below.

The MPI version is openmpi-4.1.2a1.

IMB-MPI1_help.txt osu_mbw_mr.txt 3_4_03_MPI.openmpi-4.1.2a1_1.txt

Thanks, Shinto

VinnitskiV commented 2 years ago

Hi @tetsushinto Could you please update your IMB to latest version:

git clone https://github.com/intel/mpi-benchmarks.git  
cd ./mpi-benchmarks/  
make
tetsushinto commented 2 years ago

Hi @VinnitskiV Thanks for your advice.

We tried to update the IMB to the latest version (2021.3) , but still see the same performance issue. (see detailed results in the attached spread sheet.)

We also see the performance drop does not occur with lower processes like 4. Could you give us your comment?

sendrecv.xlsx BTW, could you let me know about the following: What version of IMB is in IMB-MPI1_help.txt that I sent the other day?

tetsushinto commented 2 years ago

Hi @VinnitskiV

Could you please give me an update on this case?

Thanks, Shinto

VinnitskiV commented 2 years ago

Hi @tetsushinto , I'm sorry for so long delay. About versioning - your old version have no version info with -help option, but you can see real version in the header of log when you run benchmark. About performance drop - you are made great job! I need little bit more time to investigate this problem.

tetsushinto commented 2 years ago

Hi @VinnitskiV Thanks for your investigation on this problem. Please let me know if you get any progress.

tetsushinto commented 2 years ago

Hi @VinnitskiV Have you made any progress in investigating this problem?

VinnitskiV commented 2 years ago

Hi @tetsushinto IMB are reporting time for collective. In your case - you run N processes, which mean - N/2 collective in the same time. So you're getting competition for resources, which leads to a drop in performance per one collective

tetsushinto commented 2 years ago

Hi @VinnitskiV When testing with #osu_mbw_mr, there is no such issue, If there is competition for resources, which leads to a drop in performance per one collective. Running osu test should get the same result with IMB, right? Could you please explain what is the difference between IMB and osu?

Thanks, Shinto

tetsushinto commented 2 years ago

Hi @VinnitskiV We tried to reduce the processes, and got the result below: By using 80 CPU cores, when reducing to 40 processes, there is no change to the performance. After we reduce the processes to 4, the performance gets improved, and the performance drop disappears.(see attached) Please advise if this is the expected result, thanks.

Perf

tetsushinto commented 2 years ago

Hi @VinnitskiV

Could you provide a response to my last question? Thanks in advance for your help.

tetsushinto commented 2 years ago

Hi @VinnitskiV

We are going to close this case. Please fix this bug in your convenience.