intel / mpi-benchmarks

146 stars 63 forks source link

IMB-NBC and IMB-IO #51

Open shruticd opened 10 months ago

shruticd commented 10 months ago

In IMB-NBC, I receive an integrity fail in Ireduce Scatter with standard Intel Omni Path. In IMB-IO, P_Write_shared, P_IWrite_Shared, P_READ_Shared, P_IRead_Shared, C_Read_Shared and C_IRead_Shared represent with either a segmentation fault or an integrity fail. can you please tell why this is happening?

JuliaRS commented 8 months ago

@shruticd hi,

Please give me more information: 1) Which MPI did you use? 2) How did you run the benchmarks? 3) Please attach full output log?

shruticd commented 8 months ago

@JuliaRS hi, I used Mvapich2 - 2.3.7 with psm2 and IMB-v2021.7. The command I used: mpirun -n 2 ./IMB-IO C_Read_Shared

NBC - Ireduce_Scatter


Intel(R) MPI Benchmarks 2018, MPI-NBC part

Date : Mon Jan 15 12:10:29 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-NBC Ireduce_scatter

Minimum message length in bytes: 0 Maximum message length in bytes: 4194304

MPI_Datatype : MPI_BYTE MPI_Datatype for reductions : MPI_FLOAT MPI_Op : MPI_SUM

List of Benchmarks to run:

Ireduce_scatter


Benchmarking Ireduce_scatter processes = 2

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000         0.63         0.30         0.30         0.00         0.00

1: Error Ireduce_scatter_pure,size = 4,sample #0 Process 1: Got invalid buffer: Buffer entry: 0.000000 pos: 0 Process 1: Expected buffer: Buffer entry: 0.300000 4 1000 1.87 1.00 0.85 0.00 0.00 Application error code 1 occurred application called MPI_Abort(MPI_COMM_WORLD, 16) - process 1 [cli_1]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 16) - process 1

IO - C_Read_shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:14:51 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO C_Read_Shared

Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

C_Read_Shared


Benchmarking C_Read_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         5.48         5.48         5.48         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe51b7bc20, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer [cli_0]: aborting job: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe51b7bc20, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer

IO - P_IRead_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:13:58 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO P_IREAD_Shared Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_IRead_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed performance of 745.98 MFlops


Benchmarking P_IRead_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      3401.78         0.46      1030.45         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7fff82c51f60, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer [cli_0]: aborting job: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7fff82c51f60, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer

OI - P_IWrite_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:12:38 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO P_IWrite_shared

Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_IWrite_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed performance of 753.20 MFlops


Benchmarking P_IWrite_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      1081.32         1.53      1004.70         0.00         0.00

[mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 340197 RUNNING AT shrestha1.cdac.in = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions

OI - P_Read_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:13:33 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO P_READ_Shared

Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_Read_Shared


Benchmarking P_Read_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         0.46         0.46         0.46         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe753f1000, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer [cli_0]: aborting job: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffe753f1000, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer

OI - P_Write_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO partn----------------------------------------------------------------

Date : Mon Jan 15 12:12:11 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO P_Write_shared

Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

P_Write_Shared


Benchmarking P_Write_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec      defects
        0         1000         1.14         1.14         1.14         0.00         0.00

[mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 339463 RUNNING AT shrestha1.cdac.in = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions

OI - C_IRead_Shared

Intel(R) MPI Benchmarks 2018, MPI-IO ----------------------------------------------------------------

Date : Mon Jan 15 12:30:04 2024 Machine : x86_64 System : Linux Release : 3.10.0-957.1.3.el7.x86_64 Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018 MPI Version : 3.1 MPI Thread Environment:

Calling sequence was:

./IMB-IO C_IRead_Shared

Minimum io portion in bytes: 0 Maximum io portion in bytes: 4194304

List of Benchmarks to run:

C_IRead_Shared

For nonblocking benchmarks:

Function CPU_Exploit obtains an undisturbed performance of 740.11 MFlops


Benchmarking C_IRead_Shared processes = 1 ( 1 additional process waiting in MPI_Barrier)

MODE: AGGREGATE 

   bytes repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]      defects
        0         1000      1016.93         5.48       987.29         0.00         0.00

Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffd7ac58ae0, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer [cli_0]: aborting job: Fatal error in PMPI_Gather: Invalid buffer pointer, error stack: PMPI_Gather(929): MPI_Gather(sbuf=0x7ffd7ac58ae0, scount=1, MPI_INT, rbuf=(nil), rcount=1, MPI_INT, root=0, comm=0xc4000003) failed PMPI_Gather(851): Null buffer pointer

JuliaRS commented 2 months ago

@shruticd did you try to use running with environment variable FI_PROVIDER=tcp? It might be provider problem. I checked the same becnhmarks with Intel MPI and it works. Also, you can try to use IMB2021.8