Closed euhruska closed 6 years ago
The problem was a change in Pypi, where a new URL tree invalidated a link we use in the RP bootstrapper. This should be fixed in devel and release by now. You are in a detached head state, so I can't see what branch you are using - let me know if you need any help merging the fix into your branch.
Oh, I'm in feauture/gpu branch of rp, hoped that this has been already merged.
When I try to merge the feature/gpu and devel in my fork I get everything up to date, can you confirm? The bootstrap failed before though.
All of feature/gpu
are already merged into devel
in preparation for the upcoming release - but not all the fixes in devel
have been merged back.
Do I understand correctly that devel branch should work?
yes, indeed.
is also the feature/gpu branch of radical.entk merged into devel?
Hey Eugen, no, feature/gpu is not merged with devel in EnTK. Please use the feature/gpu branch in EnTK.
using all devel branches I get the same error as https://github.com/radical-collaboration/extasy-grlsd/issues/52
is this radical-stack correct?
radical-stack
python : 2.7.14
pythonpath :
virtualenv : extasy7
radical.analytics : v0.45.2-102-gaec2e1d@devel
radical.entk : 0.6.1-0.6.0-31-g19668b3@HEAD-detached-at-19668b3
radical.pilot : 0.47.13
radical.utils : 0.47.5
saga : 0.47.6
No - RP, RS and RU should be on devel or the feature/gpu*
branches for a GPU workload to run. EnTK looks ok - that commit (19668b3) is the HEAD of the feature/gpu
branch, as it should be.
To clarify the bootstrapper problem you encounter: this is fixed by this commit, which is in devel.
reinstalled is this correct?
python : 2.7.14
pythonpath :
virtualenv : extasy7
radical.analytics : v0.45.2-102-gaec2e1d@devel
radical.entk : 0.6.1-0.6.0-31-g19668b3@feature-gpu
radical.pilot : 0.47.12-v0.47.12-169-gff598dd4@devel
radical.utils : 0.47.4-merge-pre_gpu-22-ga942c4b@devel
saga : 0.47.4-v0.47.4-32-g71a97659@devel
yes, this looks decent! How does it behave?
why does the bootstrap try and fail to use python3.5 on bw?
#
# Create virtualenv
# cmd: /sw/bw/bwpy/mnt/bin/python virtualenv-1.9/virtualenv.py /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
#
Failed to import the site module
Traceback (most recent call last):
File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site.py", line 67, in <module>
import os
File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/os.py", line 708, in <module>
from _collections_abc import MutableMapping
ImportError: No module named '_collections_abc'
Using base prefix '/mnt/bwpy/single/usr/lib/python-exec/python3.5/../../..'
New python executable in /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
ERROR: The executable /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python is not functioning
ERROR: It thinks sys.prefix is '/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0013/pilot.0000' (should be '/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12')
ERROR: virtualenv is not compatible with this system or executable
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
ERROR: Couldn't create virtualenv
Error on virtenv creation -- abort
removed `/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12.lock'
Uh, that is unexpected... Do you load any modules in your ~/.bashrc
?
not on bw
Oh for christ sake, the BW python module changed again! Let me try to fix our configuration...
@euhruska , can you please use the RP branch fix/bw_python_interpreter
? That ensures that the 2.7 python interpreter is used. I'll merge that into devel as soon as you confirm this fixes this specific problem for you. Thanks!
Got this bootstrap error:
# Running pre_bootstrap_1 command
# cmd: module switch PrgEnv-cray PrgEnv-gnu
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module load bwpy
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
build gtod with cc... success
0.0071,bootstrap_1_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
VIRTENV : /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 (normalized)
PYTHON: python2.7
PIP : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0017/pilot.0000/../cacert.pem
0.1089,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create : TRUE
virtenv_update : FALSE
rp install sources: radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
rp install target : SANDBOX
rp install lock : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 exists
0.8987,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0017/bootstrap_0.sh: line 812: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/activate: No such file or directory
Loading of virtual env failed!
Any idea how to fix this bootstrap issue?
@euhruska , now I do. Alas it requires to start over again, and to recreate the client and the agent virtualenvs. the procedure should be along these lines:
# load and activate bwpy
module load bwpy
bwpy-environ
# create and update the client vortualenv
VIRTENV_TGZ="virtualenv-1.9.tar.gz"
VIRTENV_TGZ_URL="https://pypi.python.org/packages/source/v/virtualenv/$VIRTENV_TGZ"
curl -k -L -O "$VIRTENV_TGZ_URL"
tar zxmf "$VIRTENV_TGZ"
python2.7 virtualenv-1.9/virtualenv.py ve
source ve/bin/activate
pip install --upgrade pip
# make sure you use the right RP/RS/RU branches
cd radical.pilot
pip install .
# create the agent virtualenv
cd ~/radical.pilot.sandbox/
radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
cd -
# run a test
./examples/00_getting_started.py ncsa.bw_aprun
Please update the fix/bw_python_interpreter
branch. Also, you will need to create an agent virtualenv for each resource target you want to use. Let me know how that goes.
Cheers, Andre.
I'm installing the stack on bw, but radical.entk installation of devel with pip install . fails with:
12/hypothesis-3.57.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-TDa31Z/hypothesis/setup.py", line 34, in <module>
setuptools_version = tuple(map(int, setuptools.__version__.split('.')[:2]))
ValueError: invalid literal for int() with base 10: '6c11'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-TDa31Z/hypothesis/
radical-stack:
python : 2.7.14
pythonpath : /opt/xalt/0.7.6/sles11.3/libexec
virtualenv : /mnt/a/u/sciteam/hruska/ve
radical.pilot : 0.47.4-merge-pre_gpu-150-gec325c89@feature-gpu
radical.utils : 0.47.4-merge-pre_gpu-22-ga942c4b@devel
saga : 0.47.4-v0.47.4-33-g1a26dcbc@devel
Do I have to use a different setuptools version?
upgrading setuptools fixed this issue
But it get's stuck creating the ve
radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
script : /mnt/a/u/sciteam/hruska/ve/bin/radical-pilot-create-static-ve
prefix : ve.ncsa.bw_aprun.0.47.12
arg : bw
invoke BW magic
script : /mnt/a/u/sciteam/hruska/ve/bin/radical-pilot-create-static-ve
prefix : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
arg : bwpy
create bwpy ve [/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12]
create virtualenv ....
Any idea what causes the radical-pilot-create-static-ve get stuck, or how to make it verbose to debug this?
bugger, I have never seen it getting stuck :-( You can trace the script's activity by running it via:
/bin/sh -x radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
The resulting output might be large. If that is not conclusive though, you may want to change line 97 in the script, from:
exec bwpy-environ -- /bin/sh "$script" "$prefix" bwpy
to
exec bwpy-environ -- /bin/sh -x "$script" "$prefix" bwpy
to carry the debug mode across that exec call. Again, the output is likely large, due to the shell magic done by the module
and virtualenv
stuff... :/
added -x in front createve.txt also added -x inside script createve2.txt Not sure what the conclusion is
That helps. Can you please do two things:
which stdbuf
on your command line, and send the outputfrom
122: stdbuf -oL $VIRTENV_CMD "$prefix" | progress
133: stdbuf -oL pip install --upgrade $req | progress || exit 1
to
122: $VIRTENV_CMD "$prefix"
133: pip install --upgrade $req || exit 1
Thanks!
which stdbuf
/usr/bin/stdbuf
The lines were at a different line number, but when I changed it the command radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
worked.
the test python 00_getting_started.py ncsa.bw_aprun
failed with
KeyError: 'ncsa.bw_aprun'
Ah, you may want to add an ncsa.bw_aprun
section in examples/config.json
. Sorry, we don't have all resource labels covered there. Otherwise any other test code (or your application) should be able to confirm the viability of the install, too!
I am not sure why stdbuf
failed for you - I'll just take it out (its only cosmetic anyway...)
well, the bootstrap still fails
# -------------------------------------------------------------------
# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module switch PrgEnv-cray PrgEnv-gnu
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module load bwpy
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
build gtod with cc... success
0.0082,bootstrap_1_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
VIRTENV : /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 (normalized)
PYTHON: python2.7
PIP : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
0.1091,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create : TRUE
virtenv_update : FALSE
rp install sources: radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
rp install target : SANDBOX
rp install lock : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 exists
1.0956,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
PYTHON: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
PIP : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
File "<string>", line 1
import distutils.sysconfig as sc; print sc.get_python_version()
^
SyntaxError: invalid syntax
File "<string>", line 1
import distutils.sysconfig as sc; print sc.get_python_lib()
^
SyntaxError: invalid syntax
PYTHON INTERPRETER: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
PYTHON_VERSION :
VE_MOD_PREFIX :
PIP installer : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
PIP version : pip 10.0.1 from /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip (python 3.5)
activated virtenv
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VE_MOD_PREFIX: ///////
RP_MOD_PREFIX: ///////
PYTHONPATH : ///////:/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
7.4731,ve_activate_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
do not update virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
7.4835,rp_install_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
Using RADICAL-Pilot install sources ' radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/'
VE_MOD_PREFIX: ///////
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
SANDBOX : /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000
VE_LOC_PREFIX:
using local install tree
PYTHONPATH: ///////::/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
rp_install: ///////
radicalmod: ////////radical/
mkdir: cannot create directory `////////radical//': Read-only file system
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1237: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1238: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1239: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1240: ////////radical//__init__.py: No such file or directory
created radical namespace in ////////radical//__init__.py
# -------------------------------------------------------------------
#
# update radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/
#
Processing ./radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-fvugd9z_/setup.py", line 201
def visit((prefix, strip, found), dirname, names):
^
SyntaxError: invalid syntax
----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-fvugd9z_/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/! Lets see how far we get ...
purge install source at radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/
# -------------------------------------------------------------------
#
# update saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps saga-python-0.47.4-v0.47.4-32-g71a97659-devel/
#
Processing ./saga-python-0.47.4-v0.47.4-32-g71a97659-devel
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-vb_ikqh2/setup.py", line 202
def visit((prefix, strip, found), dirname, names):
^
SyntaxError: invalid syntax
----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-vb_ikqh2/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install saga-python-0.47.4-v0.47.4-32-g71a97659-devel/! Lets see how far we get ...
purge install source at saga-python-0.47.4-v0.47.4-32-g71a97659-devel/
# -------------------------------------------------------------------
#
# update radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
#
Processing ./radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-wpgke7ut/setup.py", line 198
def visit((prefix, strip, found), dirname, names):
^
SyntaxError: invalid syntax
----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-wpgke7ut/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/! Lets see how far we get ...
purge install source at radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
20.6616,rp_install_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
20.6722,ve_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
20.6827,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
which: no radical-pilot-agent in (/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/bin:/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin:/mnt/bwpy/single/bin:/mnt/bwpy/single/usr/bin:/sw/bw/bwpy/mnt/bin:/opt/bwpy/bin:/opt/cray/pmi/5.0.10-1.0000.11050.179.3.gem/bin:/opt/gcc/4.9.3/bin:/sw/xe/darshan/3.1.3/darshan-3.1.3/bin:/sw/EasyBuild/software/gnuplot/5.0.5/bin:/sw/EasyBuild/software/wget/1.19.4/bin:/sw/EasyBuild/software/git/2.17.0/bin:/sw/EasyBuild/software/cURL/7.59.0/bin:/sw/EasyBuild/software/OpenSSL/1.0.2m/bin:/sw/admin/scripts:/sw/user/scripts:/opt/xalt/0.7.6/sles11.3/libexec:/opt/xalt/0.7.6/sles11.3/bin:/opt/moab/9.1.2/sbin:/opt/cray/mpt/7.5.0/gni/bin:/opt/cray/craype/2.5.8/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.64982.7.19.gem/bin:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/bin:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.38.1-1.0502.21728.74.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.38.1-1.0502.21728.74.1/bin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/sbin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/bin:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.60539.1.31.gem/bin:/opt/modules/3.2.10.5/bin:/opt/torque/6.1.2/bin:/opt/torque/6.1.2/sbin:/opt/moab/9.1.2/bin:/u/sciteam/hruska/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin)
verify python viability: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python ... ok
verify module viability: saga ...Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named 'saga'
failed
python installation cannot load module saga - abort
Any idea how to fix the failing bootstrap this time?
I am confused by that log... - it initially indicates that the correct Python version is used (2.7) - but the syntax errors indicate that Python 3.x is active at that point. I don't yet see how that can happen :( I'll try to reproduce this...
I have rerun it, it's still showing both python 2.7 and 3.5 bootstrap_1out.txt
a question the ve environments should be in /u/sciteam/hruska/scratch/radical.pilot.sandbox
or /u/sciteam/hruska
?
I assumed /u/sciteam/hruska/scratch/radical.pilot.sandbox
and tried updating and reinstalling, got now, something about not finding python in radical-pilot-create-static-ve, but which python
is python 2.7 andimport radical.pilot
works:
(ve)hruska@h2ologin3:~/scratch/radical.pilot.sandbox> radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
script : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve/bin/radical-pilot-create-static-ve
prefix : ve.ncsa.bw_aprun.0.47.12
arg : bw
invoke BW magic
script : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve/bin/radical-pilot-create-static-ve
prefix : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
arg : bwpy
create bwpy ve [/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12]
create virtualenv ....
update setuptools .
update pip .
install pymongo==2.8 ...
install python-hostlist ...
install netifaces==0.10.4 ...
install setproctitle ...
install ntplib ...
install pyzmq ...
install apache-libcloud . Cache entry deserialization failed, entry ignored
.. Cache entry deserialization failed, entry ignored
.. Cache entry deserialization failed, entry ignored
.. Cache entry deserialization failed, entry ignored
.. Cache entry deserialization failed, entry ignored
.. Cache entry deserialization failed, entry ignored
...
install colorama . Cache entry deserialization failed, entry ignored
...
install backports.ssl-match-hostname ...
install msgpack-python . Cache entry deserialization failed, entry ignored
..
install future . Cache entry deserialization failed, entry ignored
..
File "<string>", line 1
import distutils.sysconfig as sc; print sc.get_python_version()
^
SyntaxError: invalid syntax
File "<string>", line 1
import distutils.sysconfig as sc; print sc.get_python_lib()
^
SyntaxError: invalid syntax
fix bwpy ve
skip python
patch python2
mv: cannot stat `python2': No such file or directory
patch python2.7
mv: cannot stat `python2.7': No such file or directory
---------------------------------------------------------------------
PYTHONPATH: /opt/xalt/0.7.6/sles11.3/libexec
python: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python (Python 3.5.4)
---------------------------------------------------------------------
a question the ve environments should be in
/u/sciteam/hruska/scratch/radical.pilot.sandbox
or/u/sciteam/hruska
?
in /u/sciteam/hruska/scratch/radical.pilot.sandbox
, as it is the sandbox the pilot needs to start up. Usually, RP creates that on the fly during startup - but on BW that does not (reliably) work from the compute nodes.
Re the latest error: I am still unable to reproduce this I'm afraid. Can you please send (either attach or per mail)
~/.bashrc
env
module list
Do you have a ~/.local/lib/python2.7/
directory? What is in ~/.local/bin/
?
~/.bashrc:
alias sq='showq -u hruska'
alias ls='ls -latr'
test -s ~/.alias && . ~/.alias || true
I don't got ~/.local/
after
module load bwpy
bwpy-environ
source ve/bin/activate
module list
Currently Loaded Modulefiles:
1) modules/3.2.10.4 13) dvs/2.5_0.9.0-1.0502.2188.1.113.gem 25) xalt/0.7.6.local
2) eswrap/1.3.3-1.020200.1280.0 14) alps/5.2.4-2.0502.9774.31.12.gem 26) scripts
3) cce/8.4.6 15) rca/1.0.0-2.0502.60530.1.63.gem 27) OpenSSL/1.0.2m
4) craype-network-gemini 16) atp/2.0.4 28) cURL/7.59.0
5) craype/2.5.8 17) PrgEnv-cray/5.2.82 29) git/2.17.0
6) cray-libsci/16.11.1 18) cray-mpich/7.5.0 30) wget/1.19.4
7) udreg/2.3.2-1.0502.10518.2.17.gem 19) craype-interlagos 31) user-paths
8) ugni/6.0-1.0502.10863.8.28.gem 20) torque/6.1.2 32) gnuplot/5.0.5
9) pmi/5.0.10-1.0000.11050.179.3.gem 21) moab/9.1.2-sles11 33) darshan/3.1.3
10) dmapp/7.0.1-1.0502.11080.8.74.gem 22) java/jdk1.8.0_51 34) bwpy/1.1.0
11) gni-headers/4.0-1.0502.10859.7.8.gem 23) globus/5.2.5
12) xpmem/0.1-2.0502.64982.5.3.gem 24) gsissh/6.2p2
env: env.txt
Thanks Eugene! Alas, I don't see any significant differences to my setup :( Can you please also attach your ~/.aliases
?
I'll try to write a standalone script today which is supposed to setup client and agent side on BW in a consistent procedure. Thanks for your patience with this...
~/.aliases
doesn't exist on bw for me
@euhruska , below is a script which seems to get me from a plain BW shell (no extra modules loaded, radical.pilot.sandbox
empty) to a functional RCT stack and pilot sandbox. I had to push some changes to the repo for this to work, but I do hope this is portable to your environment. Please do adjust the settings in the first couple of lines. Also, the test run in the last line will only succeed if the RP examples run out of the box for you - I am not sure, because of the account settings. If that's not the case, you may want to replace this with some different test code...
#!/bin/bash -l
rp_sandbox="/scratch/sciteam/$LOGNAME/radical.pilot.sandbox"
rp_resource="ncsa.bw_aprun"
rp_prefix="$HOME/rp_bw"
script="$0"
arg="$1"
if test -z "$arg"
then
# BW wants us to run all things python in its own process group (I assume
# a cgroup or something), so we spawn that here and continue the script at
# the same place
echo "invoke BW magic"
module load bwpy
set -x
exec bwpy-environ -- /bin/sh "$script" bwpy
fi
mkdir -p $rp_prefix
cd $rp_prefix
rm -rf radical.pilot; git clone git@github.com:radical-cybertools/radical.pilot.git radical.pilot
rm -rf radical.saga ; git clone git@github.com:radical-cybertools/saga-python.git radical.saga
rm -rf radical.utils; git clone git@github.com:radical-cybertools/radical.utils.git radical.utils
cd radical.pilot
git checkout fix/bw_python_interpreter
rm -rf $rp_prefix/ve
./bin/radical-pilot-create-static-ve $rp_prefix/ve bw
source $rp_prefix/ve/bin/activate
pip install --upgrade pip
cd ../radical.pilot
git checkout fix/bw_python_interpreter
pip install .
cd ../radical.saga
pip uninstall -y saga-python
pip install .
cd ../radical.utils
pip uninstall -y radical.utils
pip install .
cd ../radical.pilot
rm -rf $rp_sandbox/ve.$rp_resource.0.47.14
./bin/radical-pilot-create-static-ve $rp_sandbox/ve.$rp_resource.0.47.14 bw
export RADICAL_REPORT=True
./examples/00_getting_started.py $rp_resource
*edited script to remove an invalid cp
command
what arguments did you use $0 $1?
I skipped the
if test -z "$arg"
then
# BW wants us to run all things python in its own process group (I assume
# a cgroup or something), so we spawn that here and continue the script at
# the same place
echo "invoke BW magic"
module load bwpy
set -x
exec bwpy-environ -- /bin/sh "$script" bwpy
fi
, only did module load bwpy the test in the last line gave me an error:
.caught Exception: pymongo error: {'nModified': 0, 'nUpserted': 0, 'nMatched': 0, 'writeErrors': [{u'index': 1$
9, u'code': 12501, u'errmsg': u'quota exceeded', u'op': {'resource_sandbox': None, 'control': 'umgr', 'uid': 'unit.000159', 'stdout': None, '_id': 'unit.0001$
9', 'states': ['NEW'], 'name': None, 'client_sandbox': None, 'umgr': 'umgr.0000', 'description': {'kernel': None, 'cpu_thread_type': 'OpenMP', 'post_exec': [$
, 'gpu_process_type': None, 'executable': '/bin/date', 'stdout': None, 'pre_exec': [], 'environment': {}, 'cleanup': False, 'arguments': [], 'gpu_processes':
0, 'cpu_processes': 2, 'restartable': False, 'output_staging': [], 'gpu_thread_type': None, 'cpu_threads': 2, 'cpu_process_type': 'MPI', 'pilot': None, 'name$
: None, 'input_staging': [], 'stderr': None, 'gpu_threads': 1}, 'cmd': [], 'exit_code': None, 'state': 'NEW', 'stderr': None, 'pilot': None, 'type': 'unit', $
unit_sandbox': None, 'pilot_sandbox': None}}], 'upserted': [], 'writeConcernErrors': [], 'nRemoved': 0, 'nInserted': 159}
--------------
RADICAL Utils -- Stacktrace [2121] [MainThread]
hruska 2121 2112 20 11:59 pts/67 00:00:03 | \_ /mnt/a/u/sciteam/hruska/rp_bw/ve/bin/python2.7.rp ./examples/00_getting_started.py nc$
a.bw_aprun
hruska 3815 2121 0 11:59 pts/67 00:00:00 | \_ rp.control.pubsub.bridge.0000.child
hruska 3833 2121 0 11:59 pts/67 00:00:00 | \_ rp.state.pubsub.bridge.0000.child
hruska 3843 2121 0 11:59 pts/67 00:00:00 | \_ rp.log.pubsub.bridge.0000.child
hruska 3853 2121 0 11:59 pts/67 00:00:00 | \_ rp.update.0.child
hruska 3924 2121 0 11:59 pts/67 00:00:00 | \_ rp.pmgr.launching.queue.bridge.0000.child
hruska 3938 2121 3 11:59 pts/67 00:00:00 | \_ rp.pmgr.0000.launching.0.child
hruska 4035 2121 1 11:59 pts/70 00:00:00 | \_ /usr/local/gsi-openssh-6.2p2-2/bin/gsissh -t -o IdentityFile=/u/sciteam/hruska/.s$h/id_rsa -o ControlMaster=auto -o ControlPath=/tmp/saga_ssh_hruska_%h_%p.ctrl -o TCPKeepAlive=no -o ServerAliveInterval=10 -o ServerAliveCountMax=20 -o Conne$tTimeout=10 bw.ncsa.illinois.edu
hruska 5806 2121 0 11:59 pts/67 00:00:00 | \_ rp.umgr.reschedule.pubsub.bridge.0000.child
hruska 5821 2121 0 11:59 pts/67 00:00:00 | \_ rp.umgr.staging.input.queue.bridge.0000.child
hruska 5839 2121 0 11:59 pts/67 00:00:00 | \_ rp.umgr.staging.output.queue.bridge.0000.child
hruska 5859 2121 0 11:59 pts/67 00:00:00 | \_ rp.umgr.unschedule.pubsub.bridge.0000.child
hruska 5872 2121 0 11:59 pts/67 00:00:00 | \_ rp.umgr.scheduling.queue.bridge.0000.child
hruska 5908 2121 2 11:59 pts/67 00:00:00 | \_ rp.umgr.0000.staging.input.0.child
hruska 6004 2121 2 11:59 pts/67 00:00:00 | \_ rp.umgr.0000.staging.output.0.child
hruska 6055 2121 2 11:59 pts/67 00:00:00 | \_ rp.umgr.0000.scheduling.0.child
Traceback (most recent call last):
File "./examples/00_getting_started.py", line 101, in <module>
umgr.submit_units(cuds)
File "/mnt/a/u/sciteam/hruska/rp_bw/ve/lib/python2.7/site-packages/radical/pilot/unit_manager.py", line 748, in submit_units
self._session._dbs.insert_units(unit_docs)
File "/mnt/a/u/sciteam/hruska/rp_bw/ve/lib/python2.7/site-packages/radical/pilot/db/database.py", line 399, in insert_units
raise RuntimeError( 'pymongo error: %s' % e.details)
RuntimeError: pymongo error: {'nModified': 0, 'nUpserted': 0, 'nMatched': 0, 'writeErrors': [{u'index': 159, u'code': 12501, u'errmsg': u'quota exceeded', u'$p': {'resource_sandbox': None, 'control': 'umgr', 'uid': 'unit.000159', 'stdout': None, '_id': 'unit.000159', 'states': ['NEW'], 'name': None, 'client_sandbo$': None, 'umgr': 'umgr.0000', 'description': {'kernel': None, 'cpu_thread_type': 'OpenMP', 'post_exec': [], 'gpu_process_type': None, 'executable': '/bin/dat$', 'stdout': None, 'pre_exec': [], 'environment': {}, 'cleanup': False, 'arguments': [], 'gpu_processes': 0, 'cpu_processes': 2, 'restartable': False, 'outpu$_staging': [], 'gpu_thread_type': None, 'cpu_threads': 2, 'cpu_process_type': 'MPI', 'pilot': None, 'name': None, 'input_staging': [], 'stderr': None, 'gpu_t$reads': 1}, 'cmd': [], 'exit_code': None, 'state': 'NEW', 'stderr': None, 'pilot': None, 'type': 'unit', 'unit_sandbox': None, 'pilot_sandbox': None}}], 'ups$rted': [], 'writeConcernErrors': [], 'nRemoved': 0, 'nInserted': 159}
Does the radical.entk still contain the ResourceManager? I can't find it in https://github.com/radical-cybertools/radical.entk/blob/devel/src/radical/entk/__init__.py
File "extasy_grlsd.py", line 9, in <module>
from radical.entk import Pipeline, Stage, Task, AppManager, ResourceManager
ImportError: cannot import name ResourceManager
The script does not install entk, please try initially a test on the RP level. Also, the part you left out (the bwpy-environ
branch) is essential for this to work. The script does not need any arguments - the arg check ($1
) is used to shield the bwpy-environ
branch in the second invokation (the script calls itself - the BW python setup makes this necessary). $0
is always automatically set to the name of the script currently executing:
rivendell merzky ~ 130 $ test(){
> echo "0: $0"
> echo "1: $1"
> }
rivendell merzky ~ $ test
0: /bin/bash
1:
rivendell merzky ~ $ test foo
0: /bin/bash
1: foo
Hey Eugen, devel (and some other branches) is undergoing several changes (including API changes which is what you encountered). I created a separate branch for you, please use the fix/extasy branch of EnTK. This is already being used by others, so can be expected to be stable.
But before that please test at the RP level based on Andre's suggestions. Thanks.
included the omitted part the failing test is a radical.pilot example, still same pymongo error, but looks like extasy-grlsd starts running. Now I have to adjust the my own bw gpu envirnment for running openmm and pyemma.
I had this or similar error before, but what was the fix? bootstrap fails on the gpu settings of extasy.
local env installed with
radical-stack