radical-collaboration / extasy-grlsd

Repository to hold the input data and scripts for the ExTASY gromacs-lsdmap work
1 stars 1 forks source link

bootstrap gpu bw fails #53

Closed euhruska closed 6 years ago

euhruska commented 6 years ago

I had this or similar error before, but what was the fix? bootstrap fails on the gpu settings of extasy.

obtained lock /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.4.lock
0.7734,ve_create_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,

# -------------------------------------------------------------------
#
# Download virtualenv tgz
# cmd: curl -k -O 'https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.9.tar.gz'
#
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M100   122  100   122    0     0   2595      0 --:--:-- --:--:-- --:--:--  2652
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# unpacking virtualenv tgz
# cmd: tar zxmf 'virtualenv-1.9.tar.gz'
#

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't unpack virtualenv! Using systemv version
ERROR: invalid or unusable virtenv_dist option
Error on virtenv creation -- abort
removed `/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.4.lock'

local env installed with

conda install -c conda-forge rabbitmq-server tmux pip git python=2.7.14
pip install git+https://github.com/radical-cybertools/radical.utils.git@devel
pip install git+https://github.com/radical-cybertools/saga-python.git@devel
pip install git+https://github.com/radical-cybertools/radical.pilot.git@feature/gpu
pip install git+https://github.com/radical-cybertools/radical.entk.git@feature/gpu
pip install git+https://github.com/radical-cybertools/radical.analytics@devel

radical-stack

  python               : 2.7.14
  pythonpath           :
  virtualenv           : extasy7

  radical.analytics    : v0.45.2-102-gaec2e1d@devel
  radical.entk         : 0.6.1-0.6.0-31-g19668b3@HEAD-detached-at-19668b3
  radical.pilot        : 0.47.4-merge-pre_gpu-150-gec325c89@HEAD-detached-at-ec325c89
  radical.utils        : 0.47.4-merge-pre_gpu-22-ga942c4b@devel
  saga                 : 0.47.4-v0.47.4-32-g71a97659@devel
andre-merzky commented 6 years ago

The problem was a change in Pypi, where a new URL tree invalidated a link we use in the RP bootstrapper. This should be fixed in devel and release by now. You are in a detached head state, so I can't see what branch you are using - let me know if you need any help merging the fix into your branch.

euhruska commented 6 years ago

Oh, I'm in feauture/gpu branch of rp, hoped that this has been already merged.

euhruska commented 6 years ago

When I try to merge the feature/gpu and devel in my fork I get everything up to date, can you confirm? The bootstrap failed before though.

andre-merzky commented 6 years ago

All of feature/gpu are already merged into devel in preparation for the upcoming release - but not all the fixes in devel have been merged back.

euhruska commented 6 years ago

Do I understand correctly that devel branch should work?

andre-merzky commented 6 years ago

yes, indeed.

euhruska commented 6 years ago

is also the feature/gpu branch of radical.entk merged into devel?

vivek-bala commented 6 years ago

Hey Eugen, no, feature/gpu is not merged with devel in EnTK. Please use the feature/gpu branch in EnTK.

euhruska commented 6 years ago

using all devel branches I get the same error as https://github.com/radical-collaboration/extasy-grlsd/issues/52

euhruska commented 6 years ago

is this radical-stack correct?


radical-stack
  python               : 2.7.14
  pythonpath           :
  virtualenv           : extasy7

  radical.analytics    : v0.45.2-102-gaec2e1d@devel
  radical.entk         : 0.6.1-0.6.0-31-g19668b3@HEAD-detached-at-19668b3
  radical.pilot        : 0.47.13
  radical.utils        : 0.47.5
  saga                 : 0.47.6
andre-merzky commented 6 years ago

No - RP, RS and RU should be on devel or the feature/gpu* branches for a GPU workload to run. EnTK looks ok - that commit (19668b3) is the HEAD of the feature/gpu branch, as it should be.

andre-merzky commented 6 years ago

To clarify the bootstrapper problem you encounter: this is fixed by this commit, which is in devel.

euhruska commented 6 years ago

reinstalled is this correct?

  python               : 2.7.14
  pythonpath           :
  virtualenv           : extasy7

  radical.analytics    : v0.45.2-102-gaec2e1d@devel
  radical.entk         : 0.6.1-0.6.0-31-g19668b3@feature-gpu
  radical.pilot        : 0.47.12-v0.47.12-169-gff598dd4@devel
  radical.utils        : 0.47.4-merge-pre_gpu-22-ga942c4b@devel
  saga                 : 0.47.4-v0.47.4-32-g71a97659@devel
andre-merzky commented 6 years ago

yes, this looks decent! How does it behave?

euhruska commented 6 years ago

why does the bootstrap try and fail to use python3.5 on bw?

#
# Create virtualenv
# cmd: /sw/bw/bwpy/mnt/bin/python virtualenv-1.9/virtualenv.py /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
#
Failed to import the site module
Traceback (most recent call last):
  File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site.py", line 67, in <module>
    import os
  File "/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/os.py", line 708, in <module>
    from _collections_abc import MutableMapping
ImportError: No module named '_collections_abc'
Using base prefix '/mnt/bwpy/single/usr/lib/python-exec/python3.5/../../..'
New python executable in /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
ERROR: The executable /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python is not functioning
ERROR: It thinks sys.prefix is '/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0013/pilot.0000' (should be '/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12')
ERROR: virtualenv is not compatible with this system or executable
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
ERROR: Couldn't create virtualenv
Error on virtenv creation -- abort
removed `/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12.lock'
andre-merzky commented 6 years ago

Uh, that is unexpected... Do you load any modules in your ~/.bashrc?

euhruska commented 6 years ago

not on bw

andre-merzky commented 6 years ago

Oh for christ sake, the BW python module changed again! Let me try to fix our configuration...

andre-merzky commented 6 years ago

@euhruska , can you please use the RP branch fix/bw_python_interpreter? That ensures that the 2.7 python interpreter is used. I'll merge that into devel as soon as you confirm this fixes this specific problem for you. Thanks!

euhruska commented 6 years ago

Got this bootstrap error:

# Running pre_bootstrap_1 command
# cmd: module switch PrgEnv-cray PrgEnv-gnu
#
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module load bwpy
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
build gtod with cc... success
0.0071,bootstrap_1_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
VIRTENV : /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 (normalized)
PYTHON: python2.7
PIP   : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0017/pilot.0000/../cacert.pem
0.1089,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create   : TRUE
virtenv_update   : FALSE
rp install sources:  radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
rp install target : SANDBOX
rp install lock   : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 exists
0.8987,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017688.0017/bootstrap_0.sh: line 812: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/activate: No such file or directory
Loading of virtual env failed!
euhruska commented 6 years ago

Any idea how to fix this bootstrap issue?

andre-merzky commented 6 years ago

@euhruska , now I do. Alas it requires to start over again, and to recreate the client and the agent virtualenvs. the procedure should be along these lines:

# load and activate bwpy
module load bwpy
bwpy-environ

# create and update the client vortualenv
VIRTENV_TGZ="virtualenv-1.9.tar.gz"
VIRTENV_TGZ_URL="https://pypi.python.org/packages/source/v/virtualenv/$VIRTENV_TGZ"
curl -k -L -O "$VIRTENV_TGZ_URL"
tar zxmf "$VIRTENV_TGZ"
python2.7 virtualenv-1.9/virtualenv.py ve
source ve/bin/activate
pip install --upgrade pip

# make sure you use the right RP/RS/RU branches
cd  radical.pilot
pip install .

# create the agent  virtualenv
cd ~/radical.pilot.sandbox/
radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw
cd -

#  run a test
./examples/00_getting_started.py ncsa.bw_aprun

Please update the fix/bw_python_interpreter branch. Also, you will need to create an agent virtualenv for each resource target you want to use. Let me know how that goes.

Cheers, Andre.

euhruska commented 6 years ago

I'm installing the stack on bw, but radical.entk installation of devel with pip install . fails with:

12/hypothesis-3.57.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-TDa31Z/hypothesis/setup.py", line 34, in <module>
        setuptools_version = tuple(map(int, setuptools.__version__.split('.')[:2]))
    ValueError: invalid literal for int() with base 10: '6c11'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-TDa31Z/hypothesis/

radical-stack:

  python               : 2.7.14
  pythonpath           : /opt/xalt/0.7.6/sles11.3/libexec
  virtualenv           : /mnt/a/u/sciteam/hruska/ve

  radical.pilot        : 0.47.4-merge-pre_gpu-150-gec325c89@feature-gpu
  radical.utils        : 0.47.4-merge-pre_gpu-22-ga942c4b@devel
  saga                 : 0.47.4-v0.47.4-33-g1a26dcbc@devel

Do I have to use a different setuptools version?

euhruska commented 6 years ago

upgrading setuptools fixed this issue

euhruska commented 6 years ago

But it get's stuck creating the ve

radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw

script : /mnt/a/u/sciteam/hruska/ve/bin/radical-pilot-create-static-ve
prefix : ve.ncsa.bw_aprun.0.47.12
arg    : bw

invoke BW magic

script : /mnt/a/u/sciteam/hruska/ve/bin/radical-pilot-create-static-ve
prefix : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
arg    : bwpy

create bwpy ve [/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12]
create  virtualenv ....
euhruska commented 6 years ago

Any idea what causes the radical-pilot-create-static-ve get stuck, or how to make it verbose to debug this?

andre-merzky commented 6 years ago

bugger, I have never seen it getting stuck :-( You can trace the script's activity by running it via:

/bin/sh -x radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw

The resulting output might be large. If that is not conclusive though, you may want to change line 97 in the script, from:

exec bwpy-environ -- /bin/sh "$script" "$prefix" bwpy

to

exec bwpy-environ -- /bin/sh -x "$script" "$prefix" bwpy

to carry the debug mode across that exec call. Again, the output is likely large, due to the shell magic done by the module and virtualenv stuff... :/

euhruska commented 6 years ago

added -x in front createve.txt also added -x inside script createve2.txt Not sure what the conclusion is

andre-merzky commented 6 years ago

That helps. Can you please do two things:

from

122:  stdbuf -oL $VIRTENV_CMD "$prefix" | progress
133:      stdbuf -oL pip install --upgrade $req | progress   || exit 1

to

122:  $VIRTENV_CMD "$prefix"
133:      pip install --upgrade $req || exit 1

Thanks!

euhruska commented 6 years ago
which stdbuf
/usr/bin/stdbuf

The lines were at a different line number, but when I changed it the command radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw worked.

the test python 00_getting_started.py ncsa.bw_aprun failed with

KeyError: 'ncsa.bw_aprun'
andre-merzky commented 6 years ago

Ah, you may want to add an ncsa.bw_aprun section in examples/config.json. Sorry, we don't have all resource labels covered there. Otherwise any other test code (or your application) should be able to confirm the viability of the install, too!

I am not sure why stdbuf failed for you - I'll just take it out (its only cosmetic anyway...)

euhruska commented 6 years ago

well, the bootstrap still fails

# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module switch PrgEnv-cray PrgEnv-gnu
#
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Running pre_bootstrap_1 command
# cmd: module load bwpy
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
build gtod with cc... success
0.0082,bootstrap_1_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
VIRTENV : /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 (normalized)
PYTHON: python2.7
PIP   : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
0.1091,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create   : TRUE
virtenv_update   : FALSE
rp install sources:  radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
rp install target : SANDBOX
rp install lock   : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12 exists
1.0956,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
PYTHON: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
PIP   : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
  File "<string>", line 1
    import distutils.sysconfig as sc; print sc.get_python_version()
                                             ^
SyntaxError: invalid syntax
  File "<string>", line 1
    import distutils.sysconfig as sc; print sc.get_python_lib()
                                             ^
SyntaxError: invalid syntax
PYTHON INTERPRETER: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python
PYTHON_VERSION    :
VE_MOD_PREFIX     :
PIP installer     : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem
PIP version       : pip 10.0.1 from /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip (python 3.5)
activated virtenv
VIRTENV      : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
VE_MOD_PREFIX: ///////
RP_MOD_PREFIX: ///////
PYTHONPATH   : ///////:/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
7.4731,ve_activate_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
do not update virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
7.4835,rp_install_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
Using RADICAL-Pilot install sources ' radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/'
VE_MOD_PREFIX: ///////
VIRTENV      : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
SANDBOX      : /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000
VE_LOC_PREFIX:
using local install tree
PYTHONPATH: ///////::/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
rp_install: ///////
radicalmod: ////////radical/
mkdir: cannot create directory `////////radical//': Read-only file system
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1237: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1238: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1239: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/bootstrap_0.sh: line 1240: ////////radical//__init__.py: No such file or directory
created radical namespace in ////////radical//__init__.py

# -------------------------------------------------------------------
#
# update radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install  --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/
#
Processing ./radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-fvugd9z_/setup.py", line 201
        def visit((prefix, strip, found), dirname, names):
                  ^
    SyntaxError: invalid syntax

    ----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
  cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-fvugd9z_/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/! Lets see how far we get ...
purge install source at radical.utils-0.47.4-merge-pre-gpu-22-ga942c4b-devel/

# -------------------------------------------------------------------
#
# update saga-python-0.47.4-v0.47.4-32-g71a97659-devel/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install  --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps saga-python-0.47.4-v0.47.4-32-g71a97659-devel/
#
Processing ./saga-python-0.47.4-v0.47.4-32-g71a97659-devel
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-vb_ikqh2/setup.py", line 202
        def visit((prefix, strip, found), dirname, names):
                  ^
    SyntaxError: invalid syntax

    ----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
  cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-vb_ikqh2/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install saga-python-0.47.4-v0.47.4-32-g71a97659-devel/! Lets see how far we get ...
purge install source at saga-python-0.47.4-v0.47.4-32-g71a97659-devel/

# -------------------------------------------------------------------
#
# update radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/ via pip
# cmd: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/../cacert.pem install  --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install' --no-deps radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
#
Processing ./radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-wpgke7ut/setup.py", line 198
        def visit((prefix, strip, found), dirname, names):
                  ^
    SyntaxError: invalid syntax

    ----------------------------------------
/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/lib/python3.5/site-packages/pip/_internal/commands/install.py:199: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
  cmdoptions.check_install_build_global(options)
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-wpgke7ut/
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/! Lets see how far we get ...
purge install source at radical.pilot-0.47.12-v0.47.12-172-g6189cc97-fix-bw-python-interpreter/
20.6616,rp_install_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
20.6722,ve_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
20.6827,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
which: no radical-pilot-agent in (/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017694.0001/pilot.0000/rp_install/bin:/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin:/mnt/bwpy/single/bin:/mnt/bwpy/single/usr/bin:/sw/bw/bwpy/mnt/bin:/opt/bwpy/bin:/opt/cray/pmi/5.0.10-1.0000.11050.179.3.gem/bin:/opt/gcc/4.9.3/bin:/sw/xe/darshan/3.1.3/darshan-3.1.3/bin:/sw/EasyBuild/software/gnuplot/5.0.5/bin:/sw/EasyBuild/software/wget/1.19.4/bin:/sw/EasyBuild/software/git/2.17.0/bin:/sw/EasyBuild/software/cURL/7.59.0/bin:/sw/EasyBuild/software/OpenSSL/1.0.2m/bin:/sw/admin/scripts:/sw/user/scripts:/opt/xalt/0.7.6/sles11.3/libexec:/opt/xalt/0.7.6/sles11.3/bin:/opt/moab/9.1.2/sbin:/opt/cray/mpt/7.5.0/gni/bin:/opt/cray/craype/2.5.8/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.64982.7.19.gem/bin:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/bin:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.38.1-1.0502.21728.74.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.38.1-1.0502.21728.74.1/bin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/sbin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/bin:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.60539.1.31.gem/bin:/opt/modules/3.2.10.5/bin:/opt/torque/6.1.2/bin:/opt/torque/6.1.2/sbin:/opt/moab/9.1.2/bin:/u/sciteam/hruska/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin)
verify python viability: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python ... ok
verify module viability: saga            ...Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named 'saga'
 failed
python installation cannot load module saga - abort
euhruska commented 6 years ago

Any idea how to fix the failing bootstrap this time?

andre-merzky commented 6 years ago

I am confused by that log... - it initially indicates that the correct Python version is used (2.7) - but the syntax errors indicate that Python 3.x is active at that point. I don't yet see how that can happen :( I'll try to reproduce this...

euhruska commented 6 years ago

I have rerun it, it's still showing both python 2.7 and 3.5 bootstrap_1out.txt

euhruska commented 6 years ago

a question the ve environments should be in /u/sciteam/hruska/scratch/radical.pilot.sandbox or /u/sciteam/hruska?

euhruska commented 6 years ago

I assumed /u/sciteam/hruska/scratch/radical.pilot.sandbox

euhruska commented 6 years ago

and tried updating and reinstalling, got now, something about not finding python in radical-pilot-create-static-ve, but which python is python 2.7 andimport radical.pilot works:

(ve)hruska@h2ologin3:~/scratch/radical.pilot.sandbox> radical-pilot-create-static-ve ve.ncsa.bw_aprun.0.47.12 bw

script : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve/bin/radical-pilot-create-static-ve
prefix : ve.ncsa.bw_aprun.0.47.12
arg    : bw

invoke BW magic

script : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve/bin/radical-pilot-create-static-ve
prefix : /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12
arg    : bwpy

create bwpy ve [/u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12]
create  virtualenv ....
update  setuptools .
update  pip .
install pymongo==2.8 ...
install python-hostlist ...
install netifaces==0.10.4 ...
install setproctitle ...
install ntplib ...
install pyzmq ...
install apache-libcloud .  Cache entry deserialization failed, entry ignored
..  Cache entry deserialization failed, entry ignored
..  Cache entry deserialization failed, entry ignored
..  Cache entry deserialization failed, entry ignored
..  Cache entry deserialization failed, entry ignored
..  Cache entry deserialization failed, entry ignored
...
install colorama .  Cache entry deserialization failed, entry ignored
...
install backports.ssl-match-hostname ...
install msgpack-python .  Cache entry deserialization failed, entry ignored
..
install future .  Cache entry deserialization failed, entry ignored
..
  File "<string>", line 1
    import distutils.sysconfig as sc; print sc.get_python_version()
                                             ^
SyntaxError: invalid syntax
  File "<string>", line 1
    import distutils.sysconfig as sc; print sc.get_python_lib()
                                             ^
SyntaxError: invalid syntax
fix bwpy ve
skip  python
patch python2
mv: cannot stat `python2': No such file or directory
patch python2.7
mv: cannot stat `python2.7': No such file or directory

---------------------------------------------------------------------

PYTHONPATH: /opt/xalt/0.7.6/sles11.3/libexec
python: /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.12/bin/python (Python 3.5.4)

---------------------------------------------------------------------
andre-merzky commented 6 years ago

a question the ve environments should be in /u/sciteam/hruska/scratch/radical.pilot.sandbox or /u/sciteam/hruska?

in /u/sciteam/hruska/scratch/radical.pilot.sandbox, as it is the sandbox the pilot needs to start up. Usually, RP creates that on the fly during startup - but on BW that does not (reliably) work from the compute nodes.

andre-merzky commented 6 years ago

Re the latest error: I am still unable to reproduce this I'm afraid. Can you please send (either attach or per mail)

Do you have a ~/.local/lib/python2.7/ directory? What is in ~/.local/bin/?

euhruska commented 6 years ago

~/.bashrc:

alias sq='showq -u hruska'
alias ls='ls -latr'
test -s ~/.alias && . ~/.alias || true

I don't got ~/.local/

after

module load bwpy
bwpy-environ
source ve/bin/activate

module list

Currently Loaded Modulefiles:
  1) modules/3.2.10.4                      13) dvs/2.5_0.9.0-1.0502.2188.1.113.gem   25) xalt/0.7.6.local
  2) eswrap/1.3.3-1.020200.1280.0          14) alps/5.2.4-2.0502.9774.31.12.gem      26) scripts
  3) cce/8.4.6                             15) rca/1.0.0-2.0502.60530.1.63.gem       27) OpenSSL/1.0.2m
  4) craype-network-gemini                 16) atp/2.0.4                             28) cURL/7.59.0
  5) craype/2.5.8                          17) PrgEnv-cray/5.2.82                    29) git/2.17.0
  6) cray-libsci/16.11.1                   18) cray-mpich/7.5.0                      30) wget/1.19.4
  7) udreg/2.3.2-1.0502.10518.2.17.gem     19) craype-interlagos                     31) user-paths
  8) ugni/6.0-1.0502.10863.8.28.gem        20) torque/6.1.2                          32) gnuplot/5.0.5
  9) pmi/5.0.10-1.0000.11050.179.3.gem     21) moab/9.1.2-sles11                     33) darshan/3.1.3
 10) dmapp/7.0.1-1.0502.11080.8.74.gem     22) java/jdk1.8.0_51                      34) bwpy/1.1.0
 11) gni-headers/4.0-1.0502.10859.7.8.gem  23) globus/5.2.5
 12) xpmem/0.1-2.0502.64982.5.3.gem        24) gsissh/6.2p2

env: env.txt

andre-merzky commented 6 years ago

Thanks Eugene! Alas, I don't see any significant differences to my setup :( Can you please also attach your ~/.aliases?

I'll try to write a standalone script today which is supposed to setup client and agent side on BW in a consistent procedure. Thanks for your patience with this...

euhruska commented 6 years ago

~/.aliases doesn't exist on bw for me

andre-merzky commented 6 years ago

@euhruska , below is a script which seems to get me from a plain BW shell (no extra modules loaded, radical.pilot.sandbox empty) to a functional RCT stack and pilot sandbox. I had to push some changes to the repo for this to work, but I do hope this is portable to your environment. Please do adjust the settings in the first couple of lines. Also, the test run in the last line will only succeed if the RP examples run out of the box for you - I am not sure, because of the account settings. If that's not the case, you may want to replace this with some different test code...

#!/bin/bash -l

rp_sandbox="/scratch/sciteam/$LOGNAME/radical.pilot.sandbox"
rp_resource="ncsa.bw_aprun"
rp_prefix="$HOME/rp_bw"

script="$0"
arg="$1"

if test -z "$arg"
then
    # BW wants us to run all things python in its own process group (I assume
    # a cgroup or something), so we spawn that here and continue the script at
    # the same place
    echo "invoke BW magic"
    module load bwpy
    set -x
    exec bwpy-environ -- /bin/sh "$script" bwpy
fi

mkdir -p $rp_prefix
cd $rp_prefix

rm -rf radical.pilot; git clone git@github.com:radical-cybertools/radical.pilot.git  radical.pilot
rm -rf radical.saga ; git clone git@github.com:radical-cybertools/saga-python.git    radical.saga
rm -rf radical.utils; git clone git@github.com:radical-cybertools/radical.utils.git  radical.utils

cd radical.pilot
git checkout fix/bw_python_interpreter
rm -rf $rp_prefix/ve
./bin/radical-pilot-create-static-ve $rp_prefix/ve bw
source $rp_prefix/ve/bin/activate
pip install --upgrade pip

cd ../radical.pilot
git checkout fix/bw_python_interpreter
pip install .

cd ../radical.saga
pip uninstall -y saga-python
pip install .

cd ../radical.utils
pip uninstall -y radical.utils
pip install .

cd ../radical.pilot
rm -rf $rp_sandbox/ve.$rp_resource.0.47.14
./bin/radical-pilot-create-static-ve $rp_sandbox/ve.$rp_resource.0.47.14 bw

export RADICAL_REPORT=True
./examples/00_getting_started.py $rp_resource

*edited script to remove an invalid cp command

euhruska commented 6 years ago

what arguments did you use $0 $1?

euhruska commented 6 years ago

I skipped the

if test -z "$arg"
then
    # BW wants us to run all things python in its own process group (I assume
    # a cgroup or something), so we spawn that here and continue the script at
    # the same place
    echo "invoke BW magic"
    module load bwpy
    set -x
    exec bwpy-environ -- /bin/sh "$script" bwpy
fi

, only did module load bwpy the test in the last line gave me an error:

.caught Exception: pymongo error: {'nModified': 0, 'nUpserted': 0, 'nMatched': 0, 'writeErrors': [{u'index': 1$
9, u'code': 12501, u'errmsg': u'quota exceeded', u'op': {'resource_sandbox': None, 'control': 'umgr', 'uid': 'unit.000159', 'stdout': None, '_id': 'unit.0001$
9', 'states': ['NEW'], 'name': None, 'client_sandbox': None, 'umgr': 'umgr.0000', 'description': {'kernel': None, 'cpu_thread_type': 'OpenMP', 'post_exec': [$
, 'gpu_process_type': None, 'executable': '/bin/date', 'stdout': None, 'pre_exec': [], 'environment': {}, 'cleanup': False, 'arguments': [], 'gpu_processes':
0, 'cpu_processes': 2, 'restartable': False, 'output_staging': [], 'gpu_thread_type': None, 'cpu_threads': 2, 'cpu_process_type': 'MPI', 'pilot': None, 'name$
: None, 'input_staging': [], 'stderr': None, 'gpu_threads': 1}, 'cmd': [], 'exit_code': None, 'state': 'NEW', 'stderr': None, 'pilot': None, 'type': 'unit', $
unit_sandbox': None, 'pilot_sandbox': None}}], 'upserted': [], 'writeConcernErrors': [], 'nRemoved': 0, 'nInserted': 159}
--------------
RADICAL Utils -- Stacktrace [2121] [MainThread]

hruska    2121  2112 20 11:59 pts/67   00:00:03  |                   \_ /mnt/a/u/sciteam/hruska/rp_bw/ve/bin/python2.7.rp ./examples/00_getting_started.py nc$
a.bw_aprun
hruska    3815  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.control.pubsub.bridge.0000.child

hruska    3833  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.state.pubsub.bridge.0000.child

hruska    3843  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.log.pubsub.bridge.0000.child

hruska    3853  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.update.0.child

hruska    3924  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.pmgr.launching.queue.bridge.0000.child                                         
hruska    3938  2121  3 11:59 pts/67   00:00:00  |                       \_ rp.pmgr.0000.launching.0.child                                                    
hruska    4035  2121  1 11:59 pts/70   00:00:00  |                       \_ /usr/local/gsi-openssh-6.2p2-2/bin/gsissh -t -o IdentityFile=/u/sciteam/hruska/.s$h/id_rsa -o ControlMaster=auto -o ControlPath=/tmp/saga_ssh_hruska_%h_%p.ctrl -o TCPKeepAlive=no -o ServerAliveInterval=10 -o ServerAliveCountMax=20 -o Conne$tTimeout=10 bw.ncsa.illinois.edu
hruska    5806  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.umgr.reschedule.pubsub.bridge.0000.child                                       
hruska    5821  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.umgr.staging.input.queue.bridge.0000.child

hruska    5839  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.umgr.staging.output.queue.bridge.0000.child

hruska    5859  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.umgr.unschedule.pubsub.bridge.0000.child

hruska    5872  2121  0 11:59 pts/67   00:00:00  |                       \_ rp.umgr.scheduling.queue.bridge.0000.child

hruska    5908  2121  2 11:59 pts/67   00:00:00  |                       \_ rp.umgr.0000.staging.input.0.child

hruska    6004  2121  2 11:59 pts/67   00:00:00  |                       \_ rp.umgr.0000.staging.output.0.child

hruska    6055  2121  2 11:59 pts/67   00:00:00  |                       \_ rp.umgr.0000.scheduling.0.child
Traceback (most recent call last):
File "./examples/00_getting_started.py", line 101, in <module>
umgr.submit_units(cuds)
File "/mnt/a/u/sciteam/hruska/rp_bw/ve/lib/python2.7/site-packages/radical/pilot/unit_manager.py", line 748, in submit_units
self._session._dbs.insert_units(unit_docs)
File "/mnt/a/u/sciteam/hruska/rp_bw/ve/lib/python2.7/site-packages/radical/pilot/db/database.py", line 399, in insert_units
raise RuntimeError( 'pymongo error: %s' % e.details)
RuntimeError: pymongo error: {'nModified': 0, 'nUpserted': 0, 'nMatched': 0, 'writeErrors': [{u'index': 159, u'code': 12501, u'errmsg': u'quota exceeded', u'$p': {'resource_sandbox': None, 'control': 'umgr', 'uid': 'unit.000159', 'stdout': None, '_id': 'unit.000159', 'states': ['NEW'], 'name': None, 'client_sandbo$': None, 'umgr': 'umgr.0000', 'description': {'kernel': None, 'cpu_thread_type': 'OpenMP', 'post_exec': [], 'gpu_process_type': None, 'executable': '/bin/dat$', 'stdout': None, 'pre_exec': [], 'environment': {}, 'cleanup': False, 'arguments': [], 'gpu_processes': 0, 'cpu_processes': 2, 'restartable': False, 'outpu$_staging': [], 'gpu_thread_type': None, 'cpu_threads': 2, 'cpu_process_type': 'MPI', 'pilot': None, 'name': None, 'input_staging': [], 'stderr': None, 'gpu_t$reads': 1}, 'cmd': [], 'exit_code': None, 'state': 'NEW', 'stderr': None, 'pilot': None, 'type': 'unit', 'unit_sandbox': None, 'pilot_sandbox': None}}], 'ups$rted': [], 'writeConcernErrors': [], 'nRemoved': 0, 'nInserted': 159}
euhruska commented 6 years ago

Does the radical.entk still contain the ResourceManager? I can't find it in https://github.com/radical-cybertools/radical.entk/blob/devel/src/radical/entk/__init__.py

File "extasy_grlsd.py", line 9, in <module>
    from radical.entk import Pipeline, Stage, Task, AppManager, ResourceManager
ImportError: cannot import name ResourceManager
andre-merzky commented 6 years ago

The script does not install entk, please try initially a test on the RP level. Also, the part you left out (the bwpy-environ branch) is essential for this to work. The script does not need any arguments - the arg check ($1) is used to shield the bwpy-environ branch in the second invokation (the script calls itself - the BW python setup makes this necessary). $0 is always automatically set to the name of the script currently executing:

 rivendell  merzky  ~  130   $ test(){
> echo "0: $0"
> echo "1: $1"
> }

 rivendell  merzky  ~   $ test
0: /bin/bash
1:

rivendell  merzky  ~   $ test foo
0: /bin/bash
1: foo
vivek-bala commented 6 years ago

Hey Eugen, devel (and some other branches) is undergoing several changes (including API changes which is what you encountered). I created a separate branch for you, please use the fix/extasy branch of EnTK. This is already being used by others, so can be expected to be stable.

But before that please test at the RP level based on Andre's suggestions. Thanks.

euhruska commented 6 years ago

included the omitted part the failing test is a radical.pilot example, still same pymongo error, but looks like extasy-grlsd starts running. Now I have to adjust the my own bw gpu envirnment for running openmm and pyemma.