intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
198 stars 52 forks source link

unitrace crashes when using mpiexec #69

Open flezaalv opened 3 months ago

flezaalv commented 3 months ago

I launched unitrace in a mpiexec command:

mpiexec -n 12 -ppn 12 --pmi=pmix ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test --output /home/test/test.csv python bin/sr.py

This is executed in a single node, 12 processes are created, but when they finishes I got this error from one process and the entire mpiexec fails:

hostname: rank 0 died from signal 15

I got this error in unitrace too https://github.com/intel/pti-gpu/issues/25, is this error the cause of signal 15?

Sarbojit2019 commented 3 months ago
flezaalv commented 3 months ago
/run_mpi.sh: line 7: 169430 Segmentation fault      (core dumped) python bin/sr.py
[INFO] Log is stored in /home/test10/results.169391.0.csv
[INFO] Timeline is stored in /home/test10/run_mpi.sh.169391.0.json
hostname: rank 0 exited with code 139
hostname: rank 1 died from signal 15

The run_mpi.sh contains the entire app command. This is the mpiexec instruction with unitrace included:

mpiexec -n 2 -ppn 2 ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test10/ --output /home/test10/results.csv ./run_mpi.sh

Sure, I will share you more details.

Thanks!