IMB-RMA (c++ version) Truly_passive_put t_ovrl results in all zeros

In IMB-RMA (c++ version), I found a bug in the aggregation part of the benchmark results, where the t_ovrl results for Truly_passive_put are all 0. I will create a pull request. Below are the results when run on OpenMPI v4.1.5. The problem only occurs on the c++ version of IMB-RMA.

steps to reproduce

$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.3, MPI-RMA part
#----------------------------------------------------------------
# Date                  : Mon Oct  2 21:18:40 2023
# Machine               : x86_64
# System                : Linux
# Release               : 5.4.0-144-generic
# Version               : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version           : 3.1
# MPI Thread Environment:

# Calling sequence was:

# ./IMB-RMA Truly_passive_put

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Truly_passive_put
#     The benchmark measures execution time of MPI_Put for 2 cases:
#     1) The target is waiting in MPI_Barrier call (t_pure value)
#     2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)

#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t_pure[usec] t_ovrl[usec]
            0         1000         1.80         0.00
            1         1000         3.19         0.00
            2         1000         3.16         0.00
            4         1000         3.15         0.00
            8         1000         3.15         0.00
           16         1000         3.16         0.00
           32         1000         3.17         0.00
           64         1000         3.18         0.00
          128         1000         3.19         0.00
          256         1000         3.31         0.00
          512         1000         3.32         0.00
         1024         1000         3.37         0.00
         2048         1000         3.52         0.00
         4096         1000         4.40         0.00
         8192         1000         5.05         0.00
        16384         1000         4.50         0.00
        32768         1000         5.13         0.00
        65536          640         6.47         0.00
       131072          320         9.51         0.00
       262144          160        15.10         0.00
       524288           80        25.63         0.00
      1048576           40        46.72         0.00
      2097152           20       178.16         0.00
      4194304           10       174.46         0.00

# All processes entering MPI_Finalize

IMB-RMA (C version) has no problem

$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2018, MPI-RMA part
#----------------------------------------------------------------
# Date                  : Mon Oct  2 21:20:14 2023
# Machine               : x86_64
# System                : Linux
# Release               : 5.4.0-144-generic
# Version               : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version           : 3.1
# MPI Thread Environment:

# Calling sequence was:

# ./IMB-RMA Truly_passive_put

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Truly_passive_put
#     Comments on this Benchmark:
#     The benchmark measures execution time of MPI_Put for 2 cases:
#     1) The target is waiting in MPI_Barrier call (t_pure value)
#     2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)

#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t_pure[usec] t_ovrl[usec]
            0         1000         1.77         2.64
            1         1000         3.13         4.62
            2         1000         3.12         4.61
            4         1000         3.12         4.61
            8         1000         3.12         4.60
           16         1000         3.12         4.61
           32         1000         3.15         4.63
           64         1000         3.13         4.62
          128         1000         3.18         4.66
          256         1000         3.30         4.90
          512         1000         3.31         4.90
         1024         1000         3.38         4.95
         2048         1000         3.50         5.13
         4096         1000         4.33         5.94
         8192         1000         5.11         6.71
        16384         1000         4.49         6.11
        32768         1000         5.18         6.84
        65536          640         6.51         8.10
       131072          320         9.52        11.07
       262144          160        15.11        16.68
       524288           80        25.70        27.18
      1048576           40        47.20        48.67
      2097152           20        89.51        90.77
      4194304           10       174.41       175.64

# All processes entering MPI_Finalize

intel / mpi-benchmarks

IMB-RMA (c++ version) Truly_passive_put t_ovrl results in all zeros #49

steps to reproduce

IMB-RMA (C version) has no problem