intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

Unitrace mpi -host issue #75

Open sdeyati opened 3 months ago

sdeyati commented 3 months ago

Unitrace is not working when the application is run with mpi -host command.

Sarbojit2019 commented 3 months ago

@sdeyati, Please provide more details about what is not working in Unitrace when MPI's '-host' option is used? Also provide MPI version and system details for better debug.

sdeyati commented 3 months ago

System_Info.txt Attached a system info file with all system information

sdeyati commented 3 months ago

An example from mpirun -host command 

=== Device #0 Metrics ===

Kernel, GpuTime[ns], GpuCoreClocks[cycles], AvgGpuCoreFrequencyMHz[MHz], GpuSliceClocksCount[events], AvgGpuSliceFrequencyMHz[MHz], L3_BYTE_READ[bytes], L3_BYTE_WRITE[bytes], GPU_MEMORY_BYTE_READ[bytes], GPU_MEMORY_BYTE_WRITE[bytes], XVE_ACTIVE[%], XVE_STALL[%], XVE_BUSY[events], XVE_THREADS_OCCUPANCY_ALL[%], XVE_COMPUTE_THREAD_COUNT[threads], XVE_ATOMIC_ACCESS_COUNT[messages], XVE_BARRIER_MESSAGE_COUNT[messages], XVE_INST_EXECUTED_ALU0_ALL[events], XVE_INST_EXECUTED_ALU1_ALL[events], XVE_INST_EXECUTED_XMX_ALL[events], XVE_INST_EXECUTED_SEND_ALL[events], XVE_INST_EXECUTED_CONTROL_ALL[events], XVE_PIPE_ALU0_AND_ALU1_ACTIVE[%], XVE_PIPE_ALU0_AND_XMX_ACTIVE[%], XVE_INST_EXECUTED_ALU0_ALL_UTILIZATION[%], XVE_INST_EXECUTED_ALU1_ALL_UTILIZATION[%], XVE_INST_EXECUTED_SEND_ALL_UTILIZATION[%], XVE_INST_EXECUTED_CONTROL_ALL_UTILIZATION[%], XVE_INST_EXECUTED_XMX_ALL_UTILIZATION[%], QueryBeginTime[ns], CoreFrequencyMHz[MHz], XveSliceFrequencyMHz[MHz], ReportReason, ContextIdValid, ContextId, SourceId, StreamMarker, , 40960, 65536, 1600, 65520, 1599, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741601280, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65520, 1599, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741642240, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65560, 1600, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741683200, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65520, 1599, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741724160, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65560, 1600, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741765120, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65520, 1599, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741806080, 1599, 0, 1, 0, 0, 0, 0, , 40960, 65536, 1600, 65520, 1599, 0, 0, 0, 0, 0.000000, 0.000000, 0, 0.000000, 0, 0, 0, 0, 0, 0, 0, 0, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 144741847040, 1599, 0, 1, 0, 0, 0, 0, , 40960,

An example from mpirun  without host command 

=== Device #0 Metrics ===

Kernel, GpuTime[ns], GpuCoreClocks[cycles], AvgGpuCoreFrequencyMHz[MHz], GpuSliceClocksCount[events], AvgGpuSliceFrequencyMHz[MHz], L3_BYTE_READ[bytes], L3_BYTE_WRITE[bytes], GPU_MEMORY_BYTE_READ[bytes], GPU_MEMORY_BYTE_WRITE[bytes], XVE_ACTIVE[%], XVE_STALL[%], XVE_BUSY[events], XVE_THREADS_OCCUPANCY_ALL[%], XVE_COMPUTE_THREAD_COUNT[threads], XVE_ATOMIC_ACCESS_COUNT[messages], XVE_BARRIER_MESSAGE_COUNT[messages], XVE_INST_EXECUTED_ALU0_ALL[events], XVE_INST_EXECUTED_ALU1_ALL[events], XVE_INST_EXECUTED_XMX_ALL[events], XVE_INST_EXECUTED_SEND_ALL[events], XVE_INST_EXECUTED_CONTROL_ALL[events], XVE_PIPE_ALU0_AND_ALU1_ACTIVE[%], XVE_PIPE_ALU0_AND_XMX_ACTIVE[%], XVE_INST_EXECUTED_ALU0_ALL_UTILIZATION[%], XVE_INST_EXECUTED_ALU1_ALL_UTILIZATION[%], XVE_INST_EXECUTED_SEND_ALL_UTILIZATION[%], XVE_INST_EXECUTED_CONTROL_ALL_UTILIZATION[%], XVE_INST_EXECUTED_XMX_ALL_UTILIZATION[%], QueryBeginTime[ns], CoreFrequencyMHz[MHz], XveSliceFrequencyMHz[MHz], ReportReason, ContextIdValid, ContextId, SourceId, StreamMarker, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 51392, 1254, 51515, 1257, 37192192, 1152, 5267840, 640, 39.828487, 43.461124, 1, 41.914211, 1792, 0, 14018, 7168, 1006711, 8440357, 276466, 38039, 0.001820, 0.000000, 0.031059, 4.362075, 1.197926, 0.164823, 36.572033, 105886781440, 1599, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 32768, 800, 32800, 800, 52968960, 0, 7039360, 0, 99.878021, 0.000027, 1, 49.939026, 0, 0, 23156, 0, 528143, 14676476, 435884, 45885, 0.000000, 0.000000, 0.000000, 3.594179, 2.966327, 0.312262, 99.878021, 105886822400, 1599, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 49240, 20369, 413, 20366, 413, 32738816, 0, 4319488, 0, 100.340034, 0.000000, 1, 49.658131, 0, 0, 14276, 0, 325946, 9061920, 268967, 28297, 0.000000, 0.000000, 0.000000, 3.572415, 2.947917, 0.310139, 99.319946, 105886871640, 1499, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 32680, 14657, 448, 14688, 449, 23468544, 0, 3195520, 0, 99.776604, 0.003830, 1, 50.602531, 0, 0, 10383, 0, 239186, 6660264, 197797, 20780, 0.000000, 0.000000, 0.000000, 3.634922, 3.005931, 0.315795, 101.216377, 105886904320, 1499, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 21294, 519, 21295, 519, 33897472, 0, 4564096, 0, 99.955978, 0.000587, 1, 49.980042, 0, 0, 14802, 0, 342513, 9535614, 283075, 29876, 0.000000, 0.000000, 0.000000, 3.590223, 2.967193, 0.313160, 99.952347, 105886945280, 1499, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 18432, 450, 18440, 450, 29799424, 0, 3935360, 0, 100.005424, 0.000000, 1, 50.000679, 0, 0, 12689, 0, 296241, 8261568, 245001, 25797, 0.000000, 0.000000, 0.000000, 3.585967, 2.965712, 0.312270, 100.005424, 105886986240, 1499, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 81920, 22754, 277, 22748, 277, 36967936, 0, 4893952, 0, 100.010994, 0.000000, 1, 50.008793, 0, 0, 16030, 0, 366535, 10193568, 302737, 31840, 0.000000, 0.000000, 0.000000, 3.596617, 2.970601, 0.312429, 100.024178, 105887068160, 1399, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 21164, 516, 21120, 515, 34225152, 0, 4517504, 0, 100.028404, 0.000000, 1, 50.014202, 0, 0, 14717, 0, 340009, 9464784, 280942, 29569, 0.000000, 0.000000, 0.000000, 3.593507, 2.969236, 0.312511, 100.031960, 105887109120, 1399, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 18469, 450, 18540, 452, 29526016, 0, 3984128, 0, 99.987862, 0.000000, 1, 49.993931, 0, 0, 13242, 0, 299339, 8303904, 246795, 25947, 0.000000, 0.000000, 0.000000, 3.603923, 2.971314, 0.312392, 99.975731, 105887150080, 1399, 0, 1, 0, 0, 0, 0, "hgemm_nocopy_nn_64x40_4x8[SIMD16 {56; 1; 1} {64; 8; 1}]", 40960, 18432, 450, 18440, 450, 29273088, 0, 3955712, 0, 99.989151, 0.000000, 1, 49.994576, 0, 0, 13008, 0, 297142, 8261232, 245353, 25800, 0.000000, 0.000000, 0.000000, 3.596873, 2.969973, 0.312306, 100.001358, 105887191040, 1399, 0, 1, 0, 0, 0, 0, "hgemm_noco