radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Gromacs/LSDMap on Stampede; extasy version: 0.1.3.1-beta-14-g7c361ec #153

Closed antonst closed 9 years ago

antonst commented 9 years ago

Terminal output is:

Simulation Execution Time :  581.496
Starting Analysis
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to PendingInputStaging.
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to StagingInput.
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to PendingExecution.
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to Scheduling.
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to Executing.
[Callback]: ComputeUnit '5516196f23769c2d66e3ac33' state changed to Failed.
#######################
##       ERROR       ##
#######################
ComputeUnit 5516196f23769c2d66e3ac33 has FAILED. Can't recover.

extasy.log

vivek-bala commented 9 years ago

Could you post STDERR from unit-5516196f23769c2d66e3ac33 ?

antonst commented 9 years ago

Sure: STDERR

vivek-bala commented 9 years ago

Thanks. The initial gromacs command failed. Hmmm. Could you post the shell script in the unit folder and list the folder contents as well ?

antonst commented 9 years ago

contents:

total 116
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 run_analyzer.sh -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/run_analyzer.sh
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 pre_analyze.py -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/pre_analyze.py
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out0.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out0.gro
lrwxrwxrwx 1 antontre G-801782    94 Mar 27 22:01 lsdm.py -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/lsdm.py
lrwxrwxrwx 1 antontre G-801782    97 Mar 27 22:01 config.ini -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/config.ini
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out6.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out6.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out5.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out5.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out4.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out4.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out3.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out3.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out2.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out2.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out1.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out1.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out9.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out9.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out8.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out8.gro
lrwxrwxrwx 1 antontre G-801782   101 Mar 27 22:01 out7.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out7.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out11.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out11.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out10.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out10.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out15.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out15.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out14.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out14.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out13.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out13.gro
lrwxrwxrwx 1 antontre G-801782   102 Mar 27 22:01 out12.gro -> /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/staging_area/iter0/out12.gro
-rw------- 1 antontre G-801782     0 Mar 27 22:01 tmp.gro
-rwx------ 1 antontre G-801782   611 Mar 27 22:01 radical_pilot_cu_launch_script-nraAV5.sh
-rw------- 1 antontre G-801782    42 Mar 27 22:01 lsdmap.log
-rw------- 1 antontre G-801782   194 Mar 27 22:01 STDOUT
-rw------- 1 antontre G-801782 21452 Mar 27 22:01 STDERR
antonst commented 9 years ago

radical_pilot_cu_launch_script-nraAV5.sh

#!/bin/bash -l
cd /work/02457/antontre/radical.pilot.sandbox/pilot-5516157e23769c2d66e3ac20/unit-5516196f23769c2d66e3ac33
module load gromacs
python pre_analyze.py 16 tmp.gro
echo 2 | trjconv -f tmp.gro -s tmp.gro -o tmpha.gro
module load -intel +intel/14.0.1.106
export PYTHONPATH=/home1/03036/jp43/.local/lib/python2.7/site-packages
module load python
export PATH=/home1/03036/jp43/.local/bin:$PATH

/usr/local/bin/ibrun -n 16 -o 0 /opt/apps/intel14/mvapich2_2_0/python/2.7.6/lib/python2.7/site-packages/mpi4py/bin/python-mpi "lsdm.py" "-f" "config.ini" "-c" "tmpha.gro" "-n" "neighbors.nn" "-w" "weight.w"
vivek-bala commented 9 years ago

Could you post the contents of tmp.gro ? Seems like an odd error, could you try it again too, just to make sure it wasn't something temporaray. Thanks

vivek-bala commented 9 years ago

Closing this as a one-time/temporary system behaviour. @AntonsT please reopen if this is reproducable.