In IMB-RMA (c++ version), I found a bug in the aggregation part of the benchmark results,
where the t_ovrl results for Truly_passive_put are all 0.
I will create a pull request.
Below are the results when run on OpenMPI v4.1.5.
The problem only occurs on the c++ version of IMB-RMA.
steps to reproduce
$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.3, MPI-RMA part
#----------------------------------------------------------------
# Date : Mon Oct 2 21:18:40 2023
# Machine : x86_64
# System : Linux
# Release : 5.4.0-144-generic
# Version : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-RMA Truly_passive_put
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Truly_passive_put
# The benchmark measures execution time of MPI_Put for 2 cases:
# 1) The target is waiting in MPI_Barrier call (t_pure value)
# 2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)
#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t_pure[usec] t_ovrl[usec]
0 1000 1.80 0.00
1 1000 3.19 0.00
2 1000 3.16 0.00
4 1000 3.15 0.00
8 1000 3.15 0.00
16 1000 3.16 0.00
32 1000 3.17 0.00
64 1000 3.18 0.00
128 1000 3.19 0.00
256 1000 3.31 0.00
512 1000 3.32 0.00
1024 1000 3.37 0.00
2048 1000 3.52 0.00
4096 1000 4.40 0.00
8192 1000 5.05 0.00
16384 1000 4.50 0.00
32768 1000 5.13 0.00
65536 640 6.47 0.00
131072 320 9.51 0.00
262144 160 15.10 0.00
524288 80 25.63 0.00
1048576 40 46.72 0.00
2097152 20 178.16 0.00
4194304 10 174.46 0.00
# All processes entering MPI_Finalize
IMB-RMA (C version) has no problem
$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2018, MPI-RMA part
#----------------------------------------------------------------
# Date : Mon Oct 2 21:20:14 2023
# Machine : x86_64
# System : Linux
# Release : 5.4.0-144-generic
# Version : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-RMA Truly_passive_put
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Truly_passive_put
# Comments on this Benchmark:
# The benchmark measures execution time of MPI_Put for 2 cases:
# 1) The target is waiting in MPI_Barrier call (t_pure value)
# 2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)
#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t_pure[usec] t_ovrl[usec]
0 1000 1.77 2.64
1 1000 3.13 4.62
2 1000 3.12 4.61
4 1000 3.12 4.61
8 1000 3.12 4.60
16 1000 3.12 4.61
32 1000 3.15 4.63
64 1000 3.13 4.62
128 1000 3.18 4.66
256 1000 3.30 4.90
512 1000 3.31 4.90
1024 1000 3.38 4.95
2048 1000 3.50 5.13
4096 1000 4.33 5.94
8192 1000 5.11 6.71
16384 1000 4.49 6.11
32768 1000 5.18 6.84
65536 640 6.51 8.10
131072 320 9.52 11.07
262144 160 15.11 16.68
524288 80 25.70 27.18
1048576 40 47.20 48.67
2097152 20 89.51 90.77
4194304 10 174.41 175.64
# All processes entering MPI_Finalize
In IMB-RMA (c++ version), I found a bug in the aggregation part of the benchmark results, where the t_ovrl results for Truly_passive_put are all 0. I will create a pull request. Below are the results when run on OpenMPI v4.1.5. The problem only occurs on the c++ version of IMB-RMA.
steps to reproduce
IMB-RMA (C version) has no problem