radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

v0.43@master: 00_getting_started.py fails on ncsa.bw #1115

Closed antonst closed 7 years ago

antonst commented 8 years ago
2016-09-09 20:33:58,715: radical.pilot       : MainProcess                     : PilotLauncherWorker-1: INFO    : pilot pilot.0000 seems alive and well
|2016-09-09 20:34:14,710: radical.saga.cpi    : MainProcess                     : Thread-8       : INFO    : Job monitoring thread updating Job [torque+gsissh://bw.ncsa.illinois.edu]-[5438820] (old state: Pending, new state: Running)
-2016-09-09 20:34:55,626: radical.saga.cpi    : MainProcess                     : Thread-9       : INFO    : Job monitoring thread updating Job [torque+gsissh://bw.ncsa.illinois.edu]-[5438820] (old state: Running, new state: Failed)
-2016-09-09 20:34:59,647: radical.pilot       : MainProcess                     : PilotLauncherWorker-1: INFO    : Performing periodical health check for pilot.0000 (SAGA job id [torque+gsissh://bw.ncsa.illinois.edu]-[5438820])
2016-09-09 20:34:59,757: radical.pilot       : MainProcess                     : PilotLauncherWorker-1: WARNING : pilot pilot.0000 declared dead
|2016-09-09 20:35:00,788: radical.pilot       : MainProcess                     : Thread-1       : INFO    : ComputePilot 'pilot.0000' state changed from 'PendingActive' to 'Failed'.
2016-09-09 20:35:00,788: radical.pilot       : MainProcess                     : Thread-1       : INFO    : [Callback]: ComputePilot 'pilot.0000' state: Failed.
2016-09-09 20:35:00,788: radical.pilot       : MainProcess                     : Thread-1       : ERROR   : [Callback]: ComputePilot 'pilot.0000' failed -- calling exit
2016-09-09 20:35:00,788: radical.pilot       : MainProcess                     : Thread-1       : ERROR   : sys.exit from callback
Traceback (most recent call last):
  File "/home/antons/ve/local/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 236, in call_callbacks
    cb_func(self._shared_data[pilot_id]['facade_object'](), new_state)
  File "/home/antons/ve/local/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 251, in _default_pilot_error_cb
    sys.exit(1)
SystemExit: 1
exit requested

--------------------------------------------------------------------------------
finalize                                                                        

closing session rp.session.radical.antons.017053.0001                          \
close pilot manager2016-09-09 20:35:01,087: radical.pilot       : MainProcess                     : MainThread     : INFO    : Sent 'COMMAND_CANCEL_PILOT' command to pilots ['pilot.0000'].
                                                            \
wait for 1 pilot(s) -                                                         ok
                                                                              ok
close unit manager2016-09-09 20:35:02,207: radical.pilot       : MainProcess                     : MainThread     : INFO    : Closed UnitManager umgr.0000.
                                                            ok
session lifetime: 429.0s                                                      ok

--------------------------------------------------------------------------------

In bootstrap_1.err:

MPI functionality is now available through bwpy-mpi.
To enable MPI packages `module load bwpy-mpi` after bwpy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M100 1957k  100 1957k    0     0  10.8M      0 --:--:-- --:--:-- --:--:-- 10.9M
no previously-included directories found matching 'libcloud/test/secrets.py'
Python 2.7.11
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named pilot

No units started executing. Should I use different resource tag?

antonst commented 8 years ago

and in bootstrap_1.out:

Successfully installed saga-python
Cleaning up...
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# update radical.pilot-v0.43-master/ via pip
# cmd: /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/bin/pip --cert cacert.pem install  --src '/scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/rp_install/src' --build '/scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/rp_install/build' --install-option='--prefix=/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install' radical.pilot-v0.43-master/
#
Unpacking ./radical.pilot-v0.43-master
  Running setup.py egg_info for package from file:///mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/radical.pilot-v0.43-master
    version: v0.43@master (v0.43@master)

    warning: no files found matching '*.json'
    warning: no files found matching '*.sh'
Requirement already satisfied (use --upgrade to upgrade): saga-python in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): radical.utils in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): pymongo==2.8 in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): python-hostlist in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Downloading/unpacking netifaces (from radical.pilot==v0.43-master)
  Downloading netifaces-0.10.5.tar.gz
  Running setup.py egg_info for package netifaces

Downloading/unpacking setproctitle (from radical.pilot==v0.43-master)
  Downloading setproctitle-1.1.10.tar.gz
  Running setup.py egg_info for package setproctitle

Requirement already satisfied (use --upgrade to upgrade): ntplib in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): pyzmq in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages (from radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): apache-libcloud in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages/apache_libcloud-1.1.0-py2.7.egg (from saga-python->radical.pilot==v0.43-master)
Requirement already satisfied (use --upgrade to upgrade): colorama in /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages (from radical.utils->radical.pilot==v0.43-master)
Installing collected packages: netifaces, setproctitle, radical.pilot
  Running setup.py install for netifaces
    checking for getifaddrs...found.
    checking for getnameinfo...found.
    checking for IPv6 socket IOCTLs...not found.
    checking for optional header files...netash/ash.h netatalk/at.h netax25/ax25.h neteconet/ec.h netipx/ipx.h netpacket/packet.h linux/irda.h linux/atm.h linux/llc.h linux/tipc.h linux/dn.h.
    checking whether struct sockaddr has a length field...no.
    checking which sockaddr_xxx structs are defined...at ax25 in in6 ipx un ash ec ll atmpvc atmsvc dn irda llc.
    checking for routing socket support...no.
    checking for sysctl(CTL_NET...) support...no.
    checking for netlink support...yes.
    will use netlink to read routing table
    building 'netifaces' extension
    gcc -fPIC -DNETIFACES_VERSION=0.10.5 -DHAVE_GETIFADDRS=1 -DHAVE_GETNAMEINFO=1 -DHAVE_NETASH_ASH_H=1 -DHAVE_NETATALK_AT_H=1 -DHAVE_NETAX25_AX25_H=1 -DHAVE_NETECONET_EC_H=1 -DHAVE_NETIPX_IPX_H=1 -DHAVE_NETPACKET_PACKET_H=1 -DHAVE_LINUX_IRDA_H=1 -DHAVE_LINUX_ATM_H=1 -DHAVE_LINUX_LLC_H=1 -DHAVE_LINUX_TIPC_H=1 -DHAVE_LINUX_DN_H=1 -DHAVE_SOCKADDR_AT=1 -DHAVE_SOCKADDR_AX25=1 -DHAVE_SOCKADDR_IN=1 -DHAVE_SOCKADDR_IN6=1 -DHAVE_SOCKADDR_IPX=1 -DHAVE_SOCKADDR_UN=1 -DHAVE_SOCKADDR_ASH=1 -DHAVE_SOCKADDR_EC=1 -DHAVE_SOCKADDR_LL=1 -DHAVE_SOCKADDR_ATMPVC=1 -DHAVE_SOCKADDR_ATMSVC=1 -DHAVE_SOCKADDR_DN=1 -DHAVE_SOCKADDR_IRDA=1 -DHAVE_SOCKADDR_LLC=1 -DHAVE_PF_NETLINK=1 -I/sw/xe/bwpy/0.2.0/usr/include/python2.7 -c netifaces.c -o build/temp.linux-x86_64-2.7/netifaces.o
    netifaces.c: In function 'gateways':
    netifaces.c:1704: error: 'RTNL_FAMILY_MAX' undeclared (first use in this function)
    netifaces.c:1704: error: (Each undeclared identifier is reported only once
    netifaces.c:1704: error: for each function it appears in.)
    error: command 'gcc' failed with exit status 1
    Complete output from command /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/bin/python2.7 -c "import setuptools;__file__='/scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/rp_install/build/netifaces/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-w4gG8x-record/install-record.txt --single-version-externally-managed --install-headers /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/include/site/python2.7 --prefix=/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install:
    running install

running build

running build_ext

checking for getifaddrs...found.

checking for getnameinfo...found.

checking for IPv6 socket IOCTLs...not found.

checking for optional header files...netash/ash.h netatalk/at.h netax25/ax25.h neteconet/ec.h netipx/ipx.h netpacket/packet.h linux/irda.h linux/atm.h linux/llc.h linux/tipc.h linux/dn.h.

checking whether struct sockaddr has a length field...no.

checking which sockaddr_xxx structs are defined...at ax25 in in6 ipx un ash ec ll atmpvc atmsvc dn irda llc.

checking for routing socket support...no.

checking for sysctl(CTL_NET...) support...no.

checking for netlink support...yes.

will use netlink to read routing table

building 'netifaces' extension

gcc -fPIC -DNETIFACES_VERSION=0.10.5 -DHAVE_GETIFADDRS=1 -DHAVE_GETNAMEINFO=1 -DHAVE_NETASH_ASH_H=1 -DHAVE_NETATALK_AT_H=1 -DHAVE_NETAX25_AX25_H=1 -DHAVE_NETECONET_EC_H=1 -DHAVE_NETIPX_IPX_H=1 -DHAVE_NETPACKET_PACKET_H=1 -DHAVE_LINUX_IRDA_H=1 -DHAVE_LINUX_ATM_H=1 -DHAVE_LINUX_LLC_H=1 -DHAVE_LINUX_TIPC_H=1 -DHAVE_LINUX_DN_H=1 -DHAVE_SOCKADDR_AT=1 -DHAVE_SOCKADDR_AX25=1 -DHAVE_SOCKADDR_IN=1 -DHAVE_SOCKADDR_IN6=1 -DHAVE_SOCKADDR_IPX=1 -DHAVE_SOCKADDR_UN=1 -DHAVE_SOCKADDR_ASH=1 -DHAVE_SOCKADDR_EC=1 -DHAVE_SOCKADDR_LL=1 -DHAVE_SOCKADDR_ATMPVC=1 -DHAVE_SOCKADDR_ATMSVC=1 -DHAVE_SOCKADDR_DN=1 -DHAVE_SOCKADDR_IRDA=1 -DHAVE_SOCKADDR_LLC=1 -DHAVE_PF_NETLINK=1 -I/sw/xe/bwpy/0.2.0/usr/include/python2.7 -c netifaces.c -o build/temp.linux-x86_64-2.7/netifaces.o

netifaces.c: In function 'gateways':

netifaces.c:1704: error: 'RTNL_FAMILY_MAX' undeclared (first use in this function)

netifaces.c:1704: error: (Each undeclared identifier is reported only once

netifaces.c:1704: error: for each function it appears in.)

error: command 'gcc' failed with exit status 1

----------------------------------------
Cleaning up...
Command /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/bin/python2.7 -c "import setuptools;__file__='/scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/rp_install/build/netifaces/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-w4gG8x-record/install-record.txt --single-version-externally-managed --install-headers /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/include/site/python2.7 --prefix=/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install failed with error code 1 in /scratch/sciteam/treikali/radical.pilot.sandbox/rp.session.radical.antons.017053.0000-pilot.0000/rp_install/build/netifaces
Storing complete log in /u/sciteam/treikali/.pip/pip.log
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install radical.pilot-v0.43-master/! Lets see how far we get ...
removed `/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw.lock'

---------------------------------------------------------------------

 (/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/bin/python)
PYTHONPATH: /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/lib/python2.7/site-packages:/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/lib/python2.7/site-packages:/u/sciteam/treikali/amber14/lib/python2.6/site-packages:/u/sciteam/treikali/amber14/lib/python2.6/site-packages:/u/sciteam/treikali/amber14/lib/python2.6/site-packages:/sw/xe_xk_cle5.2UP02_pe2.3.0/xalt/0.7.5/sles11.3/libexec
utils :  0.41.1 /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/lib/python2.7/site-packages/radical/utils/__init__.pyc
saga  :  0.41.3 /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/lib/python2.7/site-packages/saga/__init__.pyc
pilot :
install failed!
antonst commented 8 years ago

I have now replaced module use --append /projects/sciteam/gkd/modules with module use --append /projects/sciteam/gk4/modules in configs/resource_ncsa.json

since project is no longer gkd, will report the outcome...

antonst commented 8 years ago

Andre: what do you mean by "for BW you can use the default resource"

antonst commented 8 years ago

still no success...

andre-merzky commented 8 years ago

"Run examples from userguide (for BW you can use the default resource)":

For the tutorial release, we had the default target for all examples set to BW, so no command line argument was needed. That change is reverted by now, so you will need to specify ncsa.bw as target resource.

As for the error: it seems that the installation of some packages in the pilot sandbox indeed fails -- possibly because the loaded python module does not contain the python development libs or does not find a compiler, or has some version screwup or whatever...

Could you please try the following on BW on command line:

module switch PrgEnv-cray PrgEnv-gnu
module load bwpy
virtualenv ve
. ve/bin/activate
pip install netifaces

Does that work?

Thanks, Andre.

antonst commented 8 years ago

pip install netifaces fails with:

(ve)treikali@h2ologin1:~/scratch/radical.pilot.sandbox> pip install netifaces
Collecting netifaces
  Downloading netifaces-0.10.5.tar.gz
Building wheels for collected packages: netifaces
  Running setup.py bdist_wheel for netifaces
  Complete output from command /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve/bin/python2.7 -c "import setuptools;__file__='/tmp/pip-build-jOs_pl/netifaces/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmp0YNKiOpip-wheel-:
  running bdist_wheel
  running build
  running build_ext
  checking for getifaddrs...found.
  checking for getnameinfo...found.
  checking for IPv6 socket IOCTLs...not found.
  checking for optional header files...netash/ash.h netatalk/at.h netax25/ax25.h neteconet/ec.h netipx/ipx.h netpacket/packet.h linux/irda.h linux/atm.h linux/llc.h linux/tipc.h linux/dn.h.
  checking whether struct sockaddr has a length field...no.
  checking which sockaddr_xxx structs are defined...at ax25 in in6 ipx un ash ec ll atmpvc atmsvc dn irda llc.
  checking for routing socket support...no.
  checking for sysctl(CTL_NET...) support...no.
  checking for netlink support...yes.
  will use netlink to read routing table
  building 'netifaces' extension
  gcc -fPIC -DNETIFACES_VERSION=0.10.5 -DHAVE_GETIFADDRS=1 -DHAVE_GETNAMEINFO=1 -DHAVE_NETASH_ASH_H=1 -DHAVE_NETATALK_AT_H=1 -DHAVE_NETAX25_AX25_H=1 -DHAVE_NETECONET_EC_H=1 -DHAVE_NETIPX_IPX_H=1 -DHAVE_NETPACKET_PACKET_H=1 -DHAVE_LINUX_IRDA_H=1 -DHAVE_LINUX_ATM_H=1 -DHAVE_LINUX_LLC_H=1 -DHAVE_LINUX_TIPC_H=1 -DHAVE_LINUX_DN_H=1 -DHAVE_SOCKADDR_AT=1 -DHAVE_SOCKADDR_AX25=1 -DHAVE_SOCKADDR_IN=1 -DHAVE_SOCKADDR_IN6=1 -DHAVE_SOCKADDR_IPX=1 -DHAVE_SOCKADDR_UN=1 -DHAVE_SOCKADDR_ASH=1 -DHAVE_SOCKADDR_EC=1 -DHAVE_SOCKADDR_LL=1 -DHAVE_SOCKADDR_ATMPVC=1 -DHAVE_SOCKADDR_ATMSVC=1 -DHAVE_SOCKADDR_DN=1 -DHAVE_SOCKADDR_IRDA=1 -DHAVE_SOCKADDR_LLC=1 -DHAVE_PF_NETLINK=1 -I/sw/xe/bwpy/0.2.0/usr/include/python2.7 -c netifaces.c -o build/temp.linux-x86_64-2.7/netifaces.o
  netifaces.c: In function 'gateways':
  netifaces.c:1704:22: error: 'RTNL_FAMILY_MAX' undeclared (first use in this function)
     int def_priorities[RTNL_FAMILY_MAX];
                        ^
  netifaces.c:1704:22: note: each undeclared identifier is reported only once for each function it appears in
  error: command 'gcc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for netifaces
Failed to build netifaces
Installing collected packages: netifaces
  Running setup.py install for netifaces
    Complete output from command /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-jOs_pl/netifaces/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-pKbX9b-record/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve/include/site/python2.7/netifaces:
    running install
    running build
    running build_ext
    checking for getifaddrs...found. (cached)
    checking for getnameinfo...found. (cached)
    checking for IPv6 socket IOCTLs...not found. (cached)
    checking for optional header files...netash/ash.h netatalk/at.h netax25/ax25.h neteconet/ec.h netipx/ipx.h netpacket/packet.h linux/irda.h linux/atm.h linux/llc.h linux/tipc.h linux/dn.h. (cached)
    checking whether struct sockaddr has a length field...no. (cached)
    checking which sockaddr_xxx structs are defined...at ax25 in in6 ipx un ash ec ll atmpvc atmsvc dn irda llc. (cached)
    checking for routing socket support...no. (cached)
    checking for sysctl(CTL_NET...) support...no. (cached)
    checking for netlink support...yes. (cached)
    will use netlink to read routing table
    building 'netifaces' extension
    gcc -fPIC -DNETIFACES_VERSION=0.10.5 -DHAVE_GETIFADDRS=1 -DHAVE_GETNAMEINFO=1 -DHAVE_NETASH_ASH_H=1 -DHAVE_NETATALK_AT_H=1 -DHAVE_NETAX25_AX25_H=1 -DHAVE_NETECONET_EC_H=1 -DHAVE_NETIPX_IPX_H=1 -DHAVE_NETPACKET_PACKET_H=1 -DHAVE_LINUX_IRDA_H=1 -DHAVE_LINUX_ATM_H=1 -DHAVE_LINUX_LLC_H=1 -DHAVE_LINUX_TIPC_H=1 -DHAVE_LINUX_DN_H=1 -DHAVE_SOCKADDR_AT=1 -DHAVE_SOCKADDR_AX25=1 -DHAVE_SOCKADDR_IN=1 -DHAVE_SOCKADDR_IN6=1 -DHAVE_SOCKADDR_IPX=1 -DHAVE_SOCKADDR_UN=1 -DHAVE_SOCKADDR_ASH=1 -DHAVE_SOCKADDR_EC=1 -DHAVE_SOCKADDR_LL=1 -DHAVE_SOCKADDR_ATMPVC=1 -DHAVE_SOCKADDR_ATMSVC=1 -DHAVE_SOCKADDR_DN=1 -DHAVE_SOCKADDR_IRDA=1 -DHAVE_SOCKADDR_LLC=1 -DHAVE_PF_NETLINK=1 -I/sw/xe/bwpy/0.2.0/usr/include/python2.7 -c netifaces.c -o build/temp.linux-x86_64-2.7/netifaces.o
    netifaces.c: In function 'gateways':
    netifaces.c:1704:22: error: 'RTNL_FAMILY_MAX' undeclared (first use in this function)
       int def_priorities[RTNL_FAMILY_MAX];
                          ^
    netifaces.c:1704:22: note: each undeclared identifier is reported only once for each function it appears in
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-jOs_pl/netifaces/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-pKbX9b-record/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve/include/site/python2.7/netifaces" failed with error code 1 in /tmp/pip-build-jOs_pl/netifaces
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
andre-merzky commented 8 years ago

Right, thanks Antons -- that is what the bootstrapper also stumbles over. Do you happen to know what changed in the system configuration, by any chance? If not, would you mind opening a support ticket about this? I assume that a change in the module load will help - but I have no immediate idea what change :)

Thanks!

antonst commented 8 years ago

Do you happen to know what changed in the system configuration, by any chance?

This is not caused by the system configuration, but by version update of `netifaces from 0.10.4 to 0.10.5 (Aug 23, 2016)

antonst commented 8 years ago

If I do: pip install netifaces==0.10.4 instead of pip install netifaces it works

andre-merzky commented 8 years ago

Good catch! It still specific to BW though it seems - we don't see netifaces install problems elsewhere. But I'll fix the dependency on 0.10.4.

antonst commented 8 years ago

OK, I will open the ticket then.

antonst commented 8 years ago

Is this where I should make a change in my local installation (?):

https://github.com/radical-cybertools/radical.pilot/blob/master/src/radical/pilot/bootstrapper/bootstrap_1.sh#L61

andre-merzky commented 8 years ago

I fixed the version now in master - please give it a try, please. Yes, a support reply would still be interesting I think. Thanks!

antonst commented 8 years ago

Thanks Andre! Should i wait for a reply from support before closing this one?

andre-merzky commented 8 years ago

Yes, lets leave it open till then.

antonst commented 8 years ago

reply from bw support:

There are probably dependencies which are too old. The bwpy build environment is set up to find some newer libraries, so I went ahead and installed netifaces into bwpy for you.

Will report if this works now.

antonst commented 8 years ago

All examples succeeded except 09_mpi_units.py, it fails with:

Traceback (most recent call last):
  File "helloworld_mpi.py", line 8, in <module>
Traceback (most recent call last):
  File "helloworld_mpi.py", line 8, in <module>
    from mpi4py import MPI
ImportError: No module named mpi4py
    from mpi4py import MPI
ImportError: No module named mpi4py

To avoid this error module load bwpy-mpi should be loaded after module load bwpy

antonst commented 8 years ago

btw the update to bwpy by admins didn't solve the previous problem...

antonst commented 8 years ago

Admins indeed updated bwpy and after loading it you get netifaces:

treikali@h2ologin1:~> pip freeze
alabaster==0.7.7
AmberTools==15.0
ansi2html==1.1.0
appdirs==1.4.0
apptools==4.3.0
argcomplete==0.9.0
astroid==1.4.3
astropy==1.0.6
astropy-helpers==1.0.6
atom==0.3.10
Babel==2.2.0
backports==1.0
backports-abc==0.4
backports.ssl-match-hostname==3.5.0.1
basemap==1.0.7
bitarray==0.8.1
boto==2.38.0
boto3==1.2.3
botocore==1.3.14
Bottleneck==1.0.0
CacheControl==0.11.5
certifi==2015.11.20.1
cffi==1.4.2
chaco==4.5.0
chainer==1.9.0
characteristic==14.3.0
chardet==2.3.0
cloudpickle==0.2.1
colorama==0.3.5
configobj==5.0.6
coverage==4.0.3
cryptography==1.1.2
cvxopt==1.1.8
cycler==0.9.0
cymem==1.31.2
Cython==0.23.4
cytoolz==0.7.3
dap==2.2.6.7
decorator==4.0.6
distlib==0.2.1
docutils==0.12
enable==4.5.1
enum34==1.0.4
filelock==2.0.6
funcsigs==0.4
functools32==3.2.3.post2
futures==3.0.3
gentoolkit==0.3.1
gevent==1.1b6
gmpy2==2.0.6
google-apputils==0.4.2
greenlet==0.4.9
h5py==2.5.0
headers-workaround==0.18
html5lib==0.9999999
httplib2==0.9.2
idna==2.0
ipaddress==1.0.16
ipykernel==4.2.2
ipyparallel==4.1.0
ipython==4.0.1
ipython-genutils==0.1.0
ipywidgets==4.1.1
Jinja2==2.8
jmespath==0.9.0
joblib==0.9.3
jsonschema==2.5.1
jupyter-client==4.1.1
jupyter-console==4.0.3
jupyter-core==4.0.6
kiwisolver==0.1.3
Lasagne==0.1
lazy-object-proxy==1.2.1
llvmlite==0.8.0
lockfile==0.12.2
logilab-common==1.1.0
lru-dict==1.1.3
Magic-file-extensions==0.2
Mako==1.0.3
MarkupSafe==0.23
matplotlib==1.5.0
meldplugin==1.0
mistune==0.7.1
MMPBSA.py==15.0
mock==1.3.0
mpmath==0.19
murmurhash==0.26.4
natsort==4.0.3
nbconvert==4.1.0
nbformat==4.0.1
ndg-httpsclient==0.4.0
netCDF4==1.2.4
netifaces==0.10.4
...

but if you do this in virtualenv netifaces is not availabe, e.g.:

(ve5)treikali@h2ologin1:~> pip freeze
AmberTools==15.0
MMPBSA.py==15.0
ParmEd==15.0b0
pyMSMT==15.0
sander==15.0
wheel==0.24.0
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
andre-merzky commented 8 years ago

Yeah, I did not expect this to solve the problem, as compiled modules will not be made available via virtualenv -- for that to help we would need to use virtualenv --system-site-packages, which we can't use for a number of reasons... But pinning the version should work all right, and since we now understand what happened on BW, I'd be happy to consider the ticket solved.

Re mpi4py: RP does not guarantee that mpi4py is available to the CUs, and the respective module load commands need to go into the CU pre_exec. So I consider that not a bug of RP, but rather an error in the example configuration. I don't think we want to address that right now, the only sane approach would be to use application kernels, and for that we have a long standing ticket already...

Can you please confirm if MPI is working if you adjust the CUs' pre_exec correspondingly? Thanks!

antonst commented 8 years ago

RP does not guarantee that mpi4py is available to the CUs, and the respective module load commands need to go into the CU pre_exec

No, specifying module loads in pre-exec does not work. Maybe it has something to do with the fact that orte-submit is using Mark's openmpi version? Loaded module has no chance in hell to find that openmpi version, so I am not sure how this should work.

marksantcroos commented 7 years ago

On the netiface issue also see https://bitbucket.org/al45tair/netifaces/pull-requests/9/coping-with-old-linux-kernels

ibethune commented 7 years ago

Antons, is there still an issue here? If not please close.

andre-merzky commented 7 years ago

BW tested ok during release testing.