radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Ibrun does not return exit code?! #1496

Closed iparask closed 6 years ago

iparask commented 6 years ago

Hello Andre,

I noticed the following happening. When I run a MPI program with ibrun on Stampede2 I see the following:

Success:

TACC:  MPI job exited with code: 1
TACC:  Shutdown complete. Exiting.
c455-041[knl](18)$ RETVAL=$?
c455-041[knl](19)$

Failure:

TACC:  MPI job exited with code: 1
TACC:  Shutdown complete. Exiting.
c455-041[knl](22)$ RETVAL=$?
c455-041[knl](23)$

I am not sure if RP is then able to get the correct exit code. Any ideas how to fix this? I will continue looking.

andre-merzky commented 6 years ago

Hmm, interesting, that works ok on stampede_1:

c557-904.stampede(1)$ ibrun -n 1 fail
TACC: Starting up job 8712327
TACC: Setting up parallel environment for MVAPICH2+mpispawn.
ERROR: The -n option to ibrun was set but -o was not.
TACC: MPI job exited with code: 1
c557-904.stampede(2)$ echo $?
1

I think this should be an XSEDE ticket. Would you mind opening one? Thanks!

iparask commented 6 years ago

Wait, I missed doing echo $RETVAL!

andre-merzky commented 6 years ago

Hahaha... :)

iparask commented 6 years ago

My mistake. It works...... Apparently the application I am running had a bug and did not return something upon completion