Nonblocking I/O operation and CPU exploit

A-Tarraf commented 2 years ago

It seems there is a bug in IMB-IO regarding the exploration of the CPU. Except for rank 0, all remaining ranks have target_reps equal to zero. I have added:

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank %d -> Nrep: %d, target rep: %d\n",rank, Nrep,target_reps);

to the end of IMB_cpu_exploit, and executed: LD_PRELOAD=./some_lib.so mpirun -np 2 ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500 here is the result:

#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.3, MPI-IO partn#----------------------------------------------------------------
# Date                  : Tue Sep  6 15:03:02 2022
# Machine               : x86_64
# System                : Linux
# Release               : 5.15.0-47-generic
# Version               : #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022
# MPI Version           : 3.1
# MPI Thread Environment: 

# Calling sequence was: 

# ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500

# Minimum io portion in bytes:   0
# Maximum io portion in bytes:   1048576
#
#
#

# List of Benchmarks to run:

# P_IWrite_Indv
rank 0 -> Nrep: 1432890, target rep: 14328

# For nonblocking benchmarks:

# Function CPU_Exploit obtains an undisturbed
# performance of  286.58 MFlops
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328

#-----------------------------------------------------------------------------
# Benchmarking P_IWrite_Indv 
# #processes = 2 
#-----------------------------------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]
            0            5    424323.39        74.06    845648.61       100.00
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
      1048576            5    429989.80     13614.71    845648.61       100.00

# All processes entering MPI_Finalize

This bug can be fixed by adding to original_benchmark.h after line 197 (#ifdef MPIIO):

if(c_info.w_rank != 0 && do_nonblocking_)
                IMB_cpu_exploit_reworked(TARGET_CPU_SECS, 1);

As it is nice to know the progress of the Nonblocking operation, I have added MPI_Testall to IMB_cpu_exploit.c If you want, I can create a pull request.

JuliaRS commented 4 months ago

@A-Tarraf is it still relevant? if yes, please prepare the PR

A-Tarraf commented 4 months ago

I have created a pull request. Please feel free to discard parts of the code if you want (for example, the tasting ability). The library I used to find out about the bugs is now available on GitHub: TMIO

Let me know if I can further support you with this.

intel / mpi-benchmarks

Nonblocking I/O operation and CPU exploit #42