Closed euhruska closed 6 years ago
This is likely not a memory issue, but a process limit limit. It seems you are using the resource tag ncsa.bw_aprun
, is that correct? Please give ncsa.bw
a try, which will use the ORTE backend.
I an error got in bootstrap_1.out (nothing in bootstrap_1.err)
################################################################################
## Searching for available TCP port for tunnel in range 23000..23100.
## Found available port: 23000
0.0557,tunnel_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
PYTHON: /sw/bw/bwpy/mnt/bin/python
PIP : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem
0.1232,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create : TRUE
virtenv_update : FALSE
rp install sources: radical.utils-0.47/ saga-python-0.47/ radical.pilot-0.47.1/
rp install target : SANDBOX
rp install lock : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1 exists
2.7280,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
PYTHON: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python
PIP : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
PYTHON INTERPRETER: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python
PYTHON_VERSION :
VE_MOD_PREFIX :
PIP installer : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
PIP version :
activated virtenv
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1
VE_MOD_PREFIX: ///////
RP_MOD_PREFIX: ///////
PYTHONPATH : ///////:/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
2.9590,ve_activate_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
do not update virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1
2.9863,rp_install_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
Using RADICAL-Pilot install sources ' radical.utils-0.47/ saga-python-0.47/ radical.pilot-0.47.1/'
VE_MOD_PREFIX: ///////
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1
SANDBOX : /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000
VE_LOC_PREFIX:
using local install tree
PYTHONPATH: ///////::/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
rp_install: ///////
radicalmod: ////////radical/
mkdir: cannot create directory `////////radical//': Read-only file system
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/bootstrap_1.sh: line 1225: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/bootstrap_1.sh: line 1226: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/bootstrap_1.sh: line 1227: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/bootstrap_1.sh: line 1228: ////////radical//__init__.py: No such file or directory
created radical namespace in ////////radical//__init__.py
# -------------------------------------------------------------------
#
# update radical.utils-0.47/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install' --no-deps radical.utils-0.47/
#
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.utils-0.47/! Lets see how far we get ...
purge install source at radical.utils-0.47/
# -------------------------------------------------------------------
#
# update saga-python-0.47/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install' --no-deps saga-python-0.47/
#
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
#
# ERROR
# no fallback command available
#
Couldn't install saga-python-0.47/! Lets see how far we get ...
purge install source at saga-python-0.47/
# -------------------------------------------------------------------
#
# update radical.pilot-0.47.1/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install' --no-deps radical.pilot-0.47.1/
#
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.pilot-0.47.1/! Lets see how far we get ...
purge install source at radical.pilot-0.47.1/
4.6689,rp_install_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
4.6797,ve_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
4.6905,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
which: no radical-pilot-agent in (/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017560.0001/pilot.0000/rp_install/bin:/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin:/mnt/bwpy/single/bin:/mnt/bwpy/single/usr/bin:/sw/bw/bwpy/mnt/bin:/opt/bwpy/bin:/opt/cray/pmi/5.0.10-1.0000.11050.179.3.gem/bin:/opt/gcc/4.9.3/bin:/sw/xe/darshan/3.1.3/darshan-3.1.3/bin:/sw/EasyBuild/software/gnuplot/5.0.5/bin:/sw/admin/scripts:/sw/user/scripts:/opt/xalt/0.7.6/sles11.3/libexec:/opt/xalt/0.7.6/sles11.3/bin:/opt/moab/9.0.2/sbin:/opt/torque/6.0.4/sbin:/opt/torque/6.0.4/bin:/opt/cray/mpt/7.5.0/gni/bin:/opt/cray/craype/2.5.8/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.64982.5.3.gem/bin:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/bin:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.24.1-1.0502.21704.63.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.24.1-1.0502.21704.63.1/bin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/sbin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/bin:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.60539.1.31.gem/bin:/opt/modules/3.2.10.5/bin:/opt/moab/9.0.2/bin:/u/sciteam/hruska/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin)
verify python viability: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python .../mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
failed
python installation (/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python) is not usable - abort
kill: no process ID specified
Try `kill --help' for more information.
Can you remove /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1
and try again?
Is this with ncsa.bw
?
@andre-merzky : doesn't using orte mean that the execution kernels that use mpi need to be recompiled with openmpi that RP uses? or is that not the case anymore?
failed with
# unpacking virtualenv tgz
# cmd: tar zxmf 'virtualenv-1.9.tar.gz'
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
#
# Create virtualenv
# cmd: /sw/bw/bwpy/mnt/bin/python virtualenv-1.9/virtualenv.py /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1
#
New python executable in /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/bin/python
Installing setuptools............done.
Installing pip...............done.
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
ERROR: Couldn't create virtualenv
Error on virtenv creation -- abort
removed `/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1.lock'
kill: no process ID specified
Try `kill --help' for more information.
Hmmm. The logs might be similar to the ones you posted initially. But just in case, could you upload the client and remote logs again?
Can you please remove /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw.0.47.1/
and try again? Seems like a binary incompatible python update got activated on BW which screwed up the VE...
@andre-merzky that's what I suggested above :) The last failure reported by Eugen is just after that. Heads-up on a similar ticket from Srinivas.
Ah, sorry, i missed that - thanks! Lets see where removing the VE gets us, I'll check Srinivas' ticket after that.
now even 'ncsa.bw_aprun' generates the same error
Quick note that this is worked upon, see radical-cybertools/radical.pilot/issues/1546. The culprit seems to be a mixture of BW python update and apache-libcloud
not liking our version of setuptools
anymore.
any progress?
The fix (or rather workaround) waits for confirmation from Srinivas. If you have the time, can you give the instruction in radical-cybertools/radical.pilot#1546 a try?
https://github.com/radical-cybertools/radical.pilot/issues/1546 reports that it works, but I still get the same error as before the bootstrap_1.out fails, see: rp.session.leonardo.rice.edu.eh22.017571.0001-remote.zip
This is after you patched python2.7
in /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin
? Can you please run these commands and send the output:
$ cd /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin
$ cat python 2.7
$ module load bwpy
$ source activate
$ ./python -V
Thanks!
I got some missing libraries, how to I load them?
>>>which python
/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python
>>>./python -V
./python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
Hmm, but those are different commands :-) Did you create the new python2. 7 script?
...
On Feb 9, 2018 13:24, "Eugen Hruska" notifications@github.com wrote:
I got:
which python /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python ./python -V ./python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/extasy-grlsd/issues/44#issuecomment-364420039, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQi-pr3Gl0WpB5AtW-xuBCixjsJ7P66ks5tTDkGgaJpZM4Rvmma .
is 0.47 and 47.1 different, I see these two versions mixed in https://github.com/radical-cybertools/radical.pilot/issues/1546 I did
source ve.ncsa.bw_aprun.0.47/bin/activate
cd ve.ncsa.bw_aprun.0.47/bin/
mv python2.7 python2.7-exe
pwd
cat > python2.7
#!/bin/bash
exec /sw/bw/bwpy/mnt/bin/bwpy-environ -- /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python2.7-exe '$@"
^C
chmod 0755 python2.7
You will need to re-create and then patch the VE for the RP version you intent to use. The above commands are indeed the commands to patch the VE.
hm, did it for 0.47.1, which python gives /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin/python
but still
./python -V gives
./python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
Could you please make the VE on BW readable? I'd like to have a look, if you don't mind...
here: /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1
sorry:
h2ologin4 merzky //scratch/sciteam 2 $ l -d hruska/
drwx------ 20 hruska PRAC_bamm 4096 Feb 2 16:27 hruska//
better now?
better, yes, but:
h2ologin4 merzky …/sciteam/hruska/radical.pilot.sandbox $ cd ve.ncsa.bw.0.47.1/
-bash: cd: ve.ncsa.bw.0.47.1/: Permission denied
:-)
I changed "aprun" - ve.ncsa.bw_aprun.0.47.1
Thanks - that explains things... Seems like your VE was different from what I and srinivas got. We ended up with this link chain:
python -> python2 -> python2.7
where the last one was the binary which got then swapped out by the patch. You seem to have the opposite:
python2.7 -> python
and thus the fix did not do much. I have no idea why that was different.
Please try the following:
$ rm python2.7-exe python2
$ mv python python2.7-exe
$ ln -s python2.7 python2
$ ln -s python2 python
but also, the script in python2.7
misses the python executable. Please change from
exec /sw/bw/bwpy/mnt/bin/bwpy-environ -- /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin "$@"
to
exec /sw/bw/bwpy/mnt/bin/bwpy-environ -- /u/sciteam/hruska/scratch/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin/python2.7-exe "$@"
looks ok now
not it fails in bootstrap_1.out with
purge install source at radical.pilot-0.47.1/
18.3512,rp_install_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
18.3618,ve_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
18.3722,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
verify python viability: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin/python ... ok
verify module viability: saga ...Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017571.0002/pilot.0000/rp_install/lib/python2.7/site-packages/saga/__init__.py", line 8, in <module>
import radical.utils as ru
File "/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017571.0002/pilot.0000/rp_install/lib/python2.7/site-packages/radical/utils/__init__.py", line 11, in <module>
from .plugin_manager import PluginManager
File "/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017571.0002/pilot.0000/rp_install/lib/python2.7/site-packages/radical/utils/plugin_manager.py", line 14, in <module>
from .logger import get_logger
File "/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017571.0002/pilot.0000/rp_install/lib/python2.7/site-packages/radical/utils/logger.py", line 118, in <module>
import colorama
ImportError: No module named colorama
failed
python installation cannot load module saga - abort
This I don't understand: the VE should have installed colorama
if you used the script from https://github.com/radical-cybertools/radical.pilot/blob/devel/bin/radical-pilot-create-static-ve
- line 6 lists colorama
as dependency. Can you please check if you used the right script, and if that worked without error message? Can you confirm that it used the Virtualenv you patched?
yes, says verify python viability: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1/bin/python ... ok
and
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47.1 (normalized)
And is colorama available in that ve? (python -c "import colorama"
)
no
Eugene,
I updated the VE creation script to do all the patching. Please download it from https://github.com/radical-cybertools/radical.pilot/blob/dbb141745330590d4bb30190dfcfeee2d3bcc07c/bin/radicalpilot-create-static-ve . Remove all the VEs from the radical.pilot.sandbox
dir (they won't work anymore anyway), and create the one you need with
radicalpilot-create-static-ve "/path/to/ve" bw
You may want to verify that the resulting ve is valid, with
$ module load bwpy
$ source "/path/to/ve/bin/activate"
$ which python
$ python -V
If that gives the expected results, a pilot agent should be able to use that VE.
Let me know how it goes!
fails with radicalpilot-create-static-ve: line 94: exec: bwpy-environment: not found
any idea why?
Yes - please try again with https://raw.githubusercontent.com/radical-cybertools/radical.pilot/e849fcf33b3b7d6b50976507d60e4613b1002fbe/bin/radicalpilot-create-static-ve
- thanks!
creating the environment works now, but when I run extasy I get Cannot mount ext3 image on /dev/loop0
Details:
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
build gtod with cc... success
0.0081,bootstrap_1_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
VIRTENV : /scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47 (normalized)
PYTHON: /sw/bw/bwpy/mnt/bin/python
PIP : /sw/bw/bwpy/mnt/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem
0.0723,ve_setup_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
virtenv_create : TRUE
virtenv_update : FALSE
rp install sources: radical.utils-0.47-v0.47-4-gcca43d5-devel/ saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/ radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/
rp install target : SANDBOX
rp install lock : FALSE
virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47 exists
0.9033,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
PYTHON: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python
PIP : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem
Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
Error: Error disassociating image from loop device: Device or resource busy!
Error: Cannot mount ext3 image on /dev/loop1 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
PYTHON INTERPRETER: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python
PYTHON_VERSION :
VE_MOD_PREFIX :
PIP installer : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem
Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
PIP version :
activated virtenv
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47
VE_MOD_PREFIX: ///////
RP_MOD_PREFIX: ///////
PYTHONPATH : ///////:/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
1.9275,ve_activate_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
do not update virtenv /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47
1.9390,rp_install_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
Using RADICAL-Pilot install sources ' radical.utils-0.47-v0.47-4-gcca43d5-devel/ saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/ radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/'
VE_MOD_PREFIX: ///////
VIRTENV : /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47
SANDBOX : /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000
VE_LOC_PREFIX:
using local install tree
PYTHONPATH: ///////::/opt/xalt/0.7.6/sles11.3/libexec:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/lib64/py
rp_install: ///////
radicalmod: ////////radical/
mkdir: cannot create directory `////////radical//': Read-only file system
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/bootstrap_1.sh: line 1225: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/bootstrap_1.sh: line 1226: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/bootstrap_1.sh: line 1227: ////////radical//__init__.py: No such file or directory
/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/bootstrap_1.sh: line 1228: ////////radical//__init__.py: No such file or directory
created radical namespace in ////////radical//__init__.py
# -------------------------------------------------------------------
#
# update radical.utils-0.47-v0.47-4-gcca43d5-devel/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install' --no-deps radical.utils-0.47-v0.47-4-gcca43d5-devel/
#
Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
Error: Error disassociating image from loop device: Device or resource busy!
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.utils-0.47-v0.47-4-gcca43d5-devel/! Lets see how far we get ...
purge install source at radical.utils-0.47-v0.47-4-gcca43d5-devel/
# -------------------------------------------------------------------
#
# update saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install' --no-deps saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/
#
Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/! Lets see how far we get ...
purge install source at saga-python-0.47-v0.46-53-gb342c0c3-feature-gpu/
# -------------------------------------------------------------------
#
# update radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/ via pip
# cmd: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/pip --cert /scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/../cacert.pem install --src '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/src' --build '/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/build' --install-option='--prefix=/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install' --no-deps radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/
#
Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
Error: Error disassociating image from loop device: Device or resource busy!
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/! Lets see how far we get ...
purge install source at radical.pilot-0.47-0.47-118-gf66e2f6d-feature-gpu/
3.6720,rp_install_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
3.6838,ve_setup_stop,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
3.6952,ve_activate_start,bootstrap_1,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
which: no radical-pilot-agent in (/scratch/sciteam/hruska/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017575.0001/pilot.0000/rp_install/bin:/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin:/mnt/bwpy/single/bin:/mnt/bwpy/single/usr/bin:/sw/bw/bwpy/mnt/bin:/opt/bwpy/bin:/opt/cray/pmi/5.0.10-1.0000.11050.179.3.gem/bin:/opt/gcc/4.9.3/bin:/sw/xe/darshan/3.1.3/darshan-3.1.3/bin:/sw/EasyBuild/software/gnuplot/5.0.5/bin:/sw/admin/scripts:/sw/user/scripts:/opt/xalt/0.7.6/sles11.3/libexec:/opt/xalt/0.7.6/sles11.3/bin:/opt/moab/9.0.2/sbin:/opt/torque/6.0.4/sbin:/opt/torque/6.0.4/bin:/opt/cray/mpt/7.5.0/gni/bin:/opt/cray/craype/2.5.8/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.64982.5.3.gem/bin:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/bin:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.24.1-1.0502.21704.63.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.46.1_1.0502.8871.24.1-1.0502.21704.63.1/bin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/sbin:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/bin:/opt/cray/sdb/1.1-1.0502.63652.4.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.60539.1.31.gem/bin:/opt/modules/3.2.10.5/bin:/opt/moab/9.0.2/bin:/u/sciteam/hruska/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin)
verify python viability: /mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python ...Error: Cannot mount ext3 image on /dev/loop0 (/mnt/a/sw/xe_xk_cle5.2UP02_pe2.3.0/images/bwpy/bwpy-0.3.2-20180213.img): Invalid argument!
failed
python installation (/mnt/c/scratch/sciteam/hruska/radical.pilot.sandbox/ve.ncsa.bw_aprun.0.47/bin/python) is not usable - abort
Oh for christ sake... I will have to open a BW ticket for this one I'm afraid...
From BW support: Can you get them to try again? I accidentally forgot to switch back to 20180201 while updating that image, so the image was momentarily invalid. They may have run it at just the wrong time.
So, please do try again :-)
bootstrap is ok
I increased the number of units to 1000 and I get the following error message:
Logfile local: rp.session.leonardo.rice.edu.eh22.017558.0002.zip Logfile remote remote zipped only one unit to reduce size from 2G: rp.session.leonardo.rice.edu.eh22.017558.0002-remote.zip
I found the same error message in this pdf https://bluewaters.ncsa.illinois.edu/c/document_library/get_file?uuid=7013c401-80ba-4c52-b377-50d2fa4da8e1&groupId=10157 on page 5, claims there is a memory limit on MOM node on bluewaters.
My question is now, am I able to launch 1000 units on bluewaters, each generates 16M of data?