radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

RP workflow on ARCHER gromacs/lsdmap #238

Closed ebreitmo closed 8 years ago

ebreitmo commented 8 years ago

Hi,

When I run, the job seems to run/finish ok

python extasy_gromacs_lsdmap.py --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg

================================================================================
 EnsembleMD (0.3.14-20-gf4dd046)                                                
================================================================================

Starting Allocation                                                           ok
        Verifying pattern                                                     ok
        Starting pattern execution                                            ok
--------------------------------------------------------------------------------
Executing simulation-analysis loop with 1 iterations on 2 allocated core(s) on 'epsrc.archer'

Job waiting on queue...
Job is now running !
Waiting for pre_loop step to complete.                                      done
Iteration 1: Waiting for 2 simulation tasks: md.gromacs to complete         done
Iteration 1: Waiting for analysis tasks: md.pre_lsdmap to complete          done
Iteration 1: Waiting for analysis tasks: md.lsdmap to complete              done
Iteration 1: Waiting for analysis tasks: md.post_lsdmap to complete         done
--------------------------------------------------------------------------------
Pattern execution successfully finished                                         

Starting Deallocation
Resource allocation cancelled.                                              done 

But when I check on ARCHER it doesn't look all right to me:

more unit.000001/STDERR 
Mon Feb  1 12:38:26 2016: [unset]:_pmi_alps_sync:alps response not OKAY
Mon Feb  1 12:38:26 2016: [unset]:_pmiu_daemon:_pmi_alps_sync failed 
Mon Feb  1 12:38:26 2016: [PE_0]:_pmi_daemon_barrier:PE pipe read failed from daemon errno = Success
Mon Feb  1 12:38:26 2016: [PE_0]:_pmi_init:_pmi_daemon_barrier returned -1
run.sh: line 15: 27942 Aborted                 gmx grompp -f grompp.mdp -c $tmpstartgro -p topol.top -o topol.tpr
Mon Feb  1 12:38:26 2016: [unset]:_pmi_alps_sync:alps response not OKAY
...
GROMACS:      gmx mdrun, VERSION 5.1
Executable:   /work/y07/y07/gmx/5.1-phase2/bin/gmx
Data prefix:  /work/y07/y07/gmx/5.1-phase2
Command line:
  gmx mdrun -nt 1 -s topol.tpr -o traj.trr -e ener.edr

cat: confout.gro
-------------------------------------------------------
Program:     gmx mdrun, VERSION 5.1
Source file: src/gromacs/commandline/cmdlineparser.cpp (line 234)
Function:    void gmx::CommandLineParser::parse(int*, char**)

Error in user input:
Invalid command-line options
  In command-line option -s
    File 'topol.tpr' does not exist or is not accessible.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
: No such file or directory
vivek-bala commented 8 years ago

Hi Elena,

Please install ensemblemd from the master branch and try this again. This would work till the lsdmap stage and fail. There seems to be a problem with mpi4py with python-compute/2.7.6 on archer.

$ module load python-compute/2.7.6
$ python
Python 2.7.6 (default, Mar 10 2014, 14:13:45) 
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
[Wed Feb  3 20:35:39 2016] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(506): 
MPID_Init(192).......: channel initialization failed
MPID_Init(569).......:  PMI2 init failed: 1
ibethune commented 8 years ago

So this appears to be fixed according to my testing (notwithstanding #239). Elena, if you agree, please close.

ebreitmo commented 8 years ago

I think you have to close it.

Cheers, Elena


Dr Elena Breitmoser

EPCC, University of Edinburgh JCMB, Room 3401 Peter Guthrie Tait Road UK-Edinburgh EH9 3FD

Tel: +44 131 650 6494

On 4 Feb 2016, at 12:40, ibethune notifications@github.com wrote:

So this appears to be fixed according to my testing (notwithstanding #239). Elena, if you agree, please close.

— Reply to this email directly or view it on GitHub.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

vivek-bala commented 8 years ago

I still get the MPI error and therefore the failure in the lsdmap stage. Am I missing something ?

ibethune commented 8 years ago

No idea, it works for me...