radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

ExTASY 0.2 - multiple_simulations_single_analysis.py fails on ARCHER #205

Closed ibethune closed 8 years ago

ibethune commented 8 years ago

I modified the standard multiple_simulations_single_analysis.py as follows:

cluster = SingleClusterEnvironment( resource="epsrc.archer", cores=1, walltime=10, username="e290ib", project="e290", queue="short" )

The job hung, and on inspection of the sandbox it appears that RP was unable to set up the Python environment. Looks like the same error that is hitting the jenkins runs too.

Most likely cause is that there is no longer any default Python env on ARCHER: http://www.archer.ac.uk/about-archer/news-events/news/2015-09-22-default_python_update.php

The relevant failure in the agent.out is:

# -------------------------------------------------------------------
#
# update radical.utils-0.35/ via pip
# cmd: /fs4/e290/shared/shared_pilot_ve_20150429/bin/pip install  --src '/work/e290/e290/e290ib/radical.pilot
.sandbox/rp.session.Iains-MBP.home.ibethune.016710.0003-pilot.0000/rp_install/src' --build '/work/e290/e290/e
290ib/radical.pilot.sandbox/rp.session.Iains-MBP.home.ibethune.016710.0003-pilot.0000/rp_install/build' --ins
tall-option='--prefix=/work/e290/e290/e290ib/radical.pilot.sandbox/rp.session.Iains-MBP.home.ibethune.016710.
0003-pilot.0000/rp_install' radical.utils-0.35/
#
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.utils-0.35/! Lets see how far we get ...

And agent.err:

ModuleCmd_Switch.c(172):ERROR:152: Module 'anaconda' is currently not loaded
which: no pip in (/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/lustre-cray_ari_s/2.4_3.0.80_0.5.1_1.0501.7664.16.1-1.0501.18401.34.1/sbin:/opt/cray/lustre-cray_ari_s/2.4_3.0.80_0.5.1_1.0501.7664.16.1-1.0501.18401.34.1/bin:/opt/cray/MySQL/5.0.64-1.0000.7096.23.2/sbin:/opt/cray/MySQL/5.0.64-1.0000.7096.23.2/bin:/opt/cray/alps/5.1.1-2.0501.8471.1.1.ari/sbin:/opt/cray/alps/5.1.1-2.0501.8471.1.1.ari/bin:/opt/cray/sdb/1.0-1.0501.48084.4.48.ari/bin:/opt/cray/nodestat/2.2-1.0501.47138.1.78.ari/bin:/usr/local/packages/cse/quickstart/1.0:/home/y07/y07/cse/nano/2.2.6/bin:/usr/local/packages/cse/serialJobs:/usr/local/packages/cse/bolt/0.6/bin:/usr/local/packages/cse/checkDisk:/usr/local/packages/cse/checkQueue:/usr/local/packages/cse/checkScript:/usr/local/packages/cse/budgets:/opt/cray/mpt/7.1.1/gni/bin:/opt/pbs/12.2.401.141761/bin:/opt/cray/atp/1.7.5/bin:/opt/cray/rca/1.0.0-2.0501.48090.7.46.ari/bin:/opt/cray/alps/5.1.1-2.0501.8507.1.1.ari/sbin:/opt/cray/alps/5.1.1-2.0501.8507.1.1.ari/bin:/opt/cray/dvs/2.4_0.9.0-1.0501.1672.2.122.ari/bin:/opt/cray/csa/3.0.0-1_2.0501.47112.1.91.ari/sbin:/opt/cray/csa/3.0.0-1_2.0501.47112.1.91.ari/bin:/opt/cray/job/1.5.5-0.1_2.0501.48066.2.43.ari/bin:/opt/cray/xpmem/0.1-2.0501.48424.3.3.ari/bin:/opt/cray/dmapp/7.0.1-1.0501.8315.8.4.ari/bin:/opt/cray/pmi/5.0.6-1.0000.10439.140.2.ari/bin:/opt/cray/ugni/5.0-1.0501.8253.10.22.ari/bin:/opt/cray/udreg/2.3.2-1.0501.7914.1.13.ari/bin:/opt/cray/cce/8.3.7/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/8.3.7/craylibs/x86-64/bin:/opt/cray/cce/8.3.7/cftn/bin:/opt/cray/cce/8.3.7/CC/bin:/opt/cray/craype/2.2.1/bin:/opt/cray/switch/1.0-1.0501.47124.1.93.ari/bin:/opt/cray/eslogin/eswrap/1.1.0-1.010400.915.0/bin:/opt/modules/3.2.10.2/bin:/home/e290/e290/e290ib/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib64/jvm/jre/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/sbin:/usr/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin)
--------------------------------------------------------------------------------
This is a private computing facility. Access to this system is limited to those
who have been granted access by the operating service provider on behalf of the
issuing authority and use is restricted to the purposes for which access was
granted. All access and usage are governed by the terms and conditions of access
agreed to by all registered users and are thus subject to the provisions of the
Computer Misuse Act, 1990 under which unauthorised use is a criminal offence.

If you are not authorised to use this service you must disconnect immediately.
--------------------------------------------------------------------------------

python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
/fs4/e290/shared/shared_pilot_ve_20150429/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
mkdir: cannot create directory `////////radical//': Read-only file system
default_bootstrapper.sh: line 961: ////////radical//__init__.py: No such file or directory
default_bootstrapper.sh: line 962: ////////radical//__init__.py: No such file or directory
default_bootstrapper.sh: line 963: ////////radical//__init__.py: No such file or directory
default_bootstrapper.sh: line 964: ////////radical//__init__.py: No such file or directory
/fs4/e290/shared/shared_pilot_ve_20150429/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
/fs4/e290/shared/shared_pilot_ve_20150429/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
/fs4/e290/shared/shared_pilot_ve_20150429/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
/fs4/e290/shared/shared_pilot_ve_20150429/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
vivek-bala commented 8 years ago

This is similar to https://github.com/radical-cybertools/radical.pilot/issues/780. Yes, I believe this is due to the changes in the python module.

marksantcroos commented 8 years ago

Most likely cause is that there is no longer any default Python env on ARCHER:

Correct. This has been fixed already in devel and will be in the upcoming release.

ps. @vivek-bala note that radical-cybertools/radical.pilot#780 is more about the mongodb url now.

ibethune commented 8 years ago

I tried upgrading RP to devel:

pip install --upgrade git+https://github.com/radical-cybertools/radical.pilot.git@devel#egg=radical.pilot

But then I got other errors when I ran. Did I miss a step?

(extasy-test-0.2)Iains-MBP:extasy-tutorial-0.2 ibethune$ RADICAL_PILOT_DBURL='mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot' RADICAL_ENMD_VERBOSE=info python multiple_simulations_single_analysis.py Traceback (most recent call last): File "multiple_simulations_single_analysis.py", line 9, in from radical.ensemblemd import Kernel File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/ensemblemd/init.py", line 26, in from radical.ensemblemd.single_cluster_environment import SingleClusterEnvironment File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 14, in import radical.pilot File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/pilot/init.py", line 20, in from .scheduler import * File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/pilot/scheduler/init.py", line 14, in from direct_submission import DirectSubmissionScheduler File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/pilot/scheduler/direct_submission.py", line 16, in from ..utils import logger File "/Users/ibethune/Desktop/extasy-test-0.2/lib/python2.7/site-packages/radical/pilot/utils/init.py", line 11, in logger = ru.get_logger('radical.pilot') AttributeError: 'module' object has no attribute 'get_logger'

marksantcroos commented 8 years ago

You need the devel branch of the whole radical stack, in particular radical.utils in this case. (saga is the other component, but is compatible I think now)

ibethune commented 8 years ago

Verify this is now fixed using latest release stack with ExTASY devel on top.