Closed ebreitmo closed 8 years ago
Could you check the output of the simulation units (1-8) and post the contents of the shell script in any of those folders ?
ls -lrt unit.000001
total 1048
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start0.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rw------- 1 ebreitmo e290 796 Feb 4 09:25 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:25 core
-rw------- 1 ebreitmo e290 95 Feb 4 09:25 STDOUT
-rw------- 1 ebreitmo e290 11636 Feb 4 09:25 STDERR
more unit.000001/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000001
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000001
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000002
total 1260
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start1.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rw------- 1 ebreitmo e290 796 Feb 4 09:25 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:25 core
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.9#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.8#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.7#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.6#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.5#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.4#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.3#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.2#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.10#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 563 Feb 4 09:25 STDOUT
-rw------- 1 ebreitmo e290 66111 Feb 4 09:25 STDERR
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.12#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.11#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 mdout.mdp
more unit.000002/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000002
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000002
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000003
total 1268
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start2.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
-rw------- 1 ebreitmo e290 796 Feb 4 09:26 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:26 core
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.9#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.8#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.7#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.6#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.5#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.4#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.3#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.2#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #topol.tpr.4#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #topol.tpr.3#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #topol.tpr.2#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #topol.tpr.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 topol.tpr
-rw------- 1 ebreitmo e290 2663 Feb 4 09:26 STDOUT
-rw------- 1 ebreitmo e290 70396 Feb 4 09:26 STDERR
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.12#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.11#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.10#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 mdout.mdp
-rw------- 1 ebreitmo e290 3312 Feb 4 09:26 md.log
more unit.000003/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000003
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000003
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000004
total 1096
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start3.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rw------- 1 ebreitmo e290 796 Feb 4 09:25 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:25 core
-rw------- 1 ebreitmo e290 95 Feb 4 09:25 STDOUT
-rw------- 1 ebreitmo e290 61366 Feb 4 09:25 STDERR
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 mdout.mdp
more unit.000004/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000004
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000004
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000005/
total 1476
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start4.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
-rw------- 1 ebreitmo e290 796 Feb 4 09:26 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 out.gro
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:26 core
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #traj.trr.5#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #traj.trr.4#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #traj.trr.3#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #traj.trr.2#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #traj.trr.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 traj.trr
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.7#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.6#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.5#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.4#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.3#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.2#
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.1#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.9#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.8#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.7#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.6#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.5#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.4#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.3#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.2#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.11#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.10#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.8#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.7#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.6#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.5#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.4#
-rw------- 1 ebreitmo e290 6509 Feb 4 09:26 #md.log.3#
-rw------- 1 ebreitmo e290 13463 Feb 4 09:26 #md.log.2#
-rw------- 1 ebreitmo e290 6509 Feb 4 09:26 #md.log.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #ener.edr.5#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #ener.edr.4#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #ener.edr.3#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #ener.edr.2#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 #ener.edr.1#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 ener.edr
-rw------- 1 ebreitmo e290 9600 Feb 4 09:26 #topol.tpr.8#
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 topol.tpr
-rw------- 1 ebreitmo e290 4307 Feb 4 09:26 STDOUT
-rw------- 1 ebreitmo e290 70059 Feb 4 09:26 STDERR
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 mdout.mdp
-rw------- 1 ebreitmo e290 6509 Feb 4 09:26 #md.log.9#
-rw------- 1 ebreitmo e290 3312 Feb 4 09:26 md.log
more unit.000005/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000005
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000005
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000006/
total 1048
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start5.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rw------- 1 ebreitmo e290 796 Feb 4 09:25 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:25 core
-rw------- 1 ebreitmo e290 10742 Feb 4 09:25 STDERR
-rw------- 1 ebreitmo e290 95 Feb 4 09:25 STDOUT
more unit.000006/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000006
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000006
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000007/
total 1240
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start6.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rw------- 1 ebreitmo e290 796 Feb 4 09:25 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:25 out.gro
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:25 core
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.9#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.8#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.7#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.6#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.5#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.4#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.3#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.2#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.10#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 527 Feb 4 09:25 STDOUT
-rw------- 1 ebreitmo e290 60642 Feb 4 09:25 STDERR
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 #mdout.mdp.11#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:25 mdout.mdp
more unit.000007/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000007
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000007
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
ls -lrt unit.000008
total 1244
lrwxrwxrwx 1 ebreitmo e290 139 Feb 4 09:25 topol.top -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/topol.top
lrwxrwxrwx 1 ebreitmo e290 145 Feb 4 09:25 start.gro -> /work/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000000/temp/start7.gro
lrwxrwxrwx 1 ebreitmo e290 136 Feb 4 09:25 run.py -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/run.py
lrwxrwxrwx 1 ebreitmo e290 140 Feb 4 09:25 grompp.mdp -> /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/staging_area/grompp.mdp
-rwx------ 1 ebreitmo e290 763 Feb 4 09:25 radical_pilot_cu_launch_script.sh
-rw------- 1 ebreitmo e290 796 Feb 4 09:26 run.sh
-rw------- 1 ebreitmo e290 0 Feb 4 09:26 out.gro
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.7#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.6#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.5#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.4#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.3#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.2#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.1#
-rw------- 1 ebreitmo e290 1470464 Feb 4 09:26 core
-rw------- 1 ebreitmo e290 63445 Feb 4 09:26 STDERR
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.9#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.8#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.11#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 #mdout.mdp.10#
-rw------- 1 ebreitmo e290 11658 Feb 4 09:26 mdout.mdp
-rw------- 1 ebreitmo e290 527 Feb 4 09:26 STDOUT
more unit.000008/radical_pilot_cu_launch_script.sh
#!/bin/sh
# Change to working directory for unit
cd /fs4/e290/e290/ebreitmo/radical.pilot.sandbox/rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000-pilot.0000/unit.000008
# Pre-exec commands
module load packages-archer
module load gromacs
module load python-compute/2.7.6
# Environment variables
export RP_SESSION_ID=rp.session.mbp-eb.epcc.ed.ac.uk.elenabreitmoser.016835.0000 RP_PILOT_ID=pilot.0000 RP_AGENT_ID=agent_0 RP_SPAWNER_ID=agent_0.AgentExecutingC
omponent.0.child RP_UNIT_ID=unit.000008
# The command to run
/opt/cray/alps/5.2.3-2.0502.9295.14.14.ari/bin/aprun -n 1 python "run.py" "--mdp" "grompp.mdp" "--gro" "start.gro" "--top" "topol.top" "--out" "out.gro"
RETVAL=$?
# Exit the script with the return code from the command
exit $RETVAL
Is the out.gro file in unit.000004 and unit.000006 empty ?
They both are empty!
Ok.. I believe this is the same as #226 .
OK I did some digging into this (since I can also recreate it, sometimes).
The root cause of the failures is not in the pre_lsdmap CU, but in the either gromacs CUs. pre_lsdmap only fails if the out*.gro files linked from the gromacs CUs are all empty. In my testing I saw various units failing: e.g.
e290ib@eslogin008:/work/e290/e290/e290ib/radical.pilot.sandbox/rp.session.mbp-ib.epcc.ed.ac.uk.ibethune.016841.0001-pilot.0000/unit.000009> wc -l out*gro
0 out0.gro
200 out1.gro
0 out2.gro
0 out3.gro
75 out4.gro
0 out5.gro
0 out6.gro
150 out7.gro
425 total
So first thing is that there is an error-detection issue here. The CUs that produce no output are failing and we should be failing those CUs rather than waiting for downstream CUs to fail.... This is a bug in the run.py and run.sh scripts which do not capture the return codes from gromacs. Please fix!
Second thing is what is causing gromacs to fail in the first place?
I have pasted the STDERR from one of the failing CUs here: https://gist.github.com/ibethune/5a1ee869e0e0356ac3ff
This appears to be the same thing that was raised in issue #238 - so we can either re-open that one, or track it here, I don't mind.
Not sure of the root cause to that, but I note that attempting to load MPI in an interactive environment with the python-compute module won't work in interactive mode. MPI can only be initialised inside a call to aprun i.e. the parallel environment has been created:
$ module load python-compute/2.7.6
$ python
Python 2.7.6 (default, Mar 10 2014, 14:13:45)
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
[Wed Feb 3 20:35:39 2016] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(506):
MPID_Init(192).......: channel initialization failed
MPID_Init(569).......: PMI2 init failed: 1
run.sh: https://gist.github.com/vivek-bala/450e248aac1bb1672a30 pbs script: https://gist.github.com/vivek-bala/b20e3b93548e7fccb2b9
I can recreate the problem with just the pbs script. Running "run.sh" from the pbs script produces that error at the mdrun command. Although, "/bin/bash run.sh" is successful if run from the command line. The non-mpi mdrun executable is being used in all cases.
Running the gromacs commands directly using aprun works: https://gist.github.com/vivek-bala/1d6ac234883f120ce33c.
Error from the first method:
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Anca Hamuraru Vincent Hindriksen
Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, VERSION 5.1
Executable: /work/y07/y07/gmx/5.1-phase2/bin/gmx
Data prefix: /work/y07/y07/gmx/5.1-phase2
Command line:
gmx mdrun -nt 1 -s topol.tpr -o traj.trr -e ener.edr
-------------------------------------------------------
Program: gmx mdrun, VERSION 5.1
Source file: src/gromacs/commandline/cmdlineparser.cpp (line 234)
Function: void gmx::CommandLineParser::parse(int*, char**)
Error in user input:
Invalid command-line options
In command-line option -s
File 'topol.tpr' does not exist or is not accessible.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
cat: confout.gro: No such file or directory
Replacing old mdp entry 'nstxtcout' by 'nstxout-compressed'
Replacing old mdp entry 'xtc_grps' by 'compressed-x-grps'
Setting the LD random seed to 1379599184
Wed Feb 10 16:46:54 2016: [unset]:_pmi_alps_sync:alps response not OKAY
Wed Feb 10 16:46:54 2016: [unset]:_pmiu_daemon:_pmi_alps_sync failed
run.sh: line 15: 20418 Aborted gmx grompp -f grompp.mdp -c $tmpstartgro -p topol.top -o topol.tpr
Wed Feb 10 16:46:54 2016: [PE_0]:_pmi_daemon_barrier:PE pipe read failed from daemon errno = Success
Wed Feb 10 16:46:54 2016: [PE_0]:_pmi_init:_pmi_daemon_barrier returned -1
FYI, I am in touch with the Cray/ARCHER team, and we're looking into this.
Root cause is some stray PMI libraries linked in to the 'serial' gmx
binary. I have installed a fixed build, but still waiting for the central install to be updated. For now, you can replace in kernel_defs/gromacs.py
:
module load gromacs
with export PATH=$PATH:/work/z01/shared/gromacs-5.1.2/bin
Will update the ticket when a final solution is in place.
Outdated.
Hi,
I started with a clean virtualenv, did
got the latest grls-on-archer.tar.gz
On ARCHER
more unit.000009/STDERR