nerscadmin / IPM

Integrated Performance Monitoring for High Performance Computing
http://ipm-hpc.org
GNU Lesser General Public License v2.1
86 stars 36 forks source link

IPM 1: ERROR ipm_finalize() called with ipm_state=4 #31

Open KuXinZYY opened 6 years ago

KuXinZYY commented 6 years ago

[root@localhost em_real]# mpirun --allow-run-as-root --hostfile myhost -x LD_PRELOAD -np 4 ./hello_c Hello, world, I am 0 of 4, (Open MPI v3.1.0, package: Open MPI root@localhost.localdomain Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 121) Hello, world, I am 2 of 4, (Open MPI v3.1.0, package: Open MPI root@arm-node01 Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 110) Hello, world, I am 1 of 4, (Open MPI v3.1.0, package: Open MPI root@localhost.localdomain Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 121) Hello, world, I am 3 of 4, (Open MPI v3.1.0, package: Open MPI root@arm-node01 Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 110) ^CIPM 0: ERROR ipm_finalize() called with ipm_state=4 IPM 1: ERROR ipm_finalize() called with ipm_state=4 IPM 3: ERROR ipm_finalize() called with ipm_state=4 IPM 2: ERROR ipm_finalize() called with ipm_state=4

Hello, when I run the MPI program on a machine (mpirun --allow-run-as-root -x LD_PRELOAD -np 4 ./hello_c), IPM can calculate OpenMPI performance very well.

But when I run IPM through two machine clusters (mpirun --allow-run-as-root --hostfile myhost -x LD_PRELOAD -np 4 ./hello_c), the program stops. When I manually enter CRTL+C, It will be prompted with an IPM error.

what is this question? Can IPM test MPI cluster performance? Thank you

cdaley commented 6 years ago

Hello. Yes, IPM can monitor application performance across compute nodes. I suspect the error you are seeing is just that your application has deadlocked. The subsequent IPM error is just the IPM signal handler catching your Ctrl+C. I recommend you test your application across nodes without IPM. Does this also deadlock? Also, there should be no need to run MPI application as root.

Thanks, Chris