radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Bootstrap Error Wrangler #949

Closed iparask closed 7 years ago

iparask commented 8 years ago

This is a new one. I appears from time to time.

Python 2.7.9
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/rp_install/lib/python2.7/site-packages/radical/pilot/__init__.py", line 20, in <module>
    from .scheduler  import *
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/rp_install/lib/python2.7/site-packages/radical/pilot/scheduler/__init__.py", line 14, in <module>
    from direct_submission import DirectSubmissionScheduler
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/rp_install/lib/python2.7/site-packages/radical/pilot/scheduler/direct_submission.py", line 16, in <module>
    from ..utils    import logger
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/rp_install/lib/python2.7/site-packages/radical/pilot/utils/__init__.py", line 17, in <module>
    from queue              import *
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/rp_install/lib/python2.7/site-packages/radical/pilot/utils/queue.py", line 3, in <module>
    import zmq
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/lib/python2.7/site-packages/zmq/__init__.py", line 66, in <module>
    from zmq import backend
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/lib/python2.7/site-packages/zmq/backend/__init__.py", line 40, in <module>
    reraise(*exc_info)
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/lib/python2.7/site-packages/zmq/backend/__init__.py", line 27, in <module>
    _ns = select_backend(first)
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/lib/python2.7/site-packages/zmq/backend/select.py", line 27, in select_backend
    mod = __import__(name, fromlist=public_api)
  File "/work/03170/tg824689/wrangler/radical.pilot.sandbox/ve_wrangler/lib/python2.7/site-packages/zmq/backend/cython/__init__.py", line 6, in <module>
    from . import (constants, error, message, context,
ImportError: libzmq.so.3: cannot open shared object file: No such file or directory
marksantcroos commented 8 years ago

It looks like the bootstrap "thought" that zmq was already present, but that the library can't be found during usage. Can you provide bootstrapper logs?

iparask commented 8 years ago

Yes here

andre-merzky commented 8 years ago
Requirement already satisfied (use --upgrade to upgrade): pyzmq in /opt/apps/intel15/python/2.7.9/lib/python2.7/site-packages (from radical.pilot==v0.38-386-g33e4a37-tmp-ioannis)

So it does find the pyzmq moduke, as Mark suspected, but then cannot use it because some module or the other is not loaded, and thus libzmq is not available.

Giannis, could you please check in module avail if some zmq or libzmq module exists? Thanks!

iparask commented 8 years ago

I did not see it, but just in case here is the output:

-------------------------------------------------------------------------------- /opt/apps/intel15/mvapich2_2_1/modulefiles --------------------------------------------------------------------------------
   Rstats/3.2.1        parallel-netcdf/4.3.2     petsc/3.5-cxx                petsc/3.5-cxxdebug    petsc/3.5-uni             phdf5/1.8.14
   big-data-r/3.2.1    petsc/3.5-complex         petsc/3.5-cxxcomplex         petsc/3.5-debug       petsc/3.5-unidebug        python/2.7.9 (D)
   fftw3/3.3.4         petsc/3.5-complexdebug    petsc/3.5-cxxcomplexdebug    petsc/3.5-single      petsc/3.5          (D)

-------------------------------------------------------------------------------------- /opt/apps/intel15/modulefiles ---------------------------------------------------------------------------------------
   boost/1.55.0    cxx11/4.9.1    gsl/1.16 (m)    hdf5/1.8.14    impi/5.0.3    mvapich2/2.1    netcdf/4.3.2    python/2.7.9

--------------------------------------------------------------------------------------- /opt/apps/xsede/modulefiles ----------------------------------------------------------------------------------------
   CTSSV4            TERAGRID-DEV        cue-build        cue-tg                             globus-5.0        pacman/3.29       tginfo/1.1.3  (D)    xdresourceid/1.0
   GLOBUS-5.0        TERAGRID-paths      cue-comm         gateway-usage-reporting/1.0        globus-6.0        teragrid-basic    tgproxy/0.9.1        xdusage/1.0
   GLOBUS-6.0        apache-ant/1.6.5    cue-login-env    globus/5.2.5                       gsissh/4.3        teragrid-dev      tgresid/2.3.4        xdusage/1.1-2    (D)
   TERAGRID-BASIC    ctssv4              cue-math         globus/6.0                  (D)    gsissh/5.7 (D)    tginfo/1.1.1      uberftp/2.8

------------------------------------------------------------------------------------------ /opt/apps/modulefiles -------------------------------------------------------------------------------------------
   Rstudio/0.99.473    gcc/4.9.1 (D)    hadoop-paths/2.5.0    idl/8.4         irods/4             java64/1.8.0    launcher/2.0    luatools/1.1        matlab/2015a    pylauncher/2.1    spark-paths/1.2.0
   autotools/1.1       git/2.4.1        idev/1.0              intel/15.0.3    java-paths/1.7.0    jdk64/1.8.0     lmod/5.9.4      mathematica/10.2    mcr/v85         settarg/5.9.4     tacc_tips/0.4

--------------------------------------------------------------------------------------------- /opt/modulefiles ---------------------------------------------------------------------------------------------
   Linux    TACC    TACC-paths    cluster    cluster-paths
iparask commented 8 years ago

I did an ls in the intel15 folder and zmq is there as well as pyzmq-14.5.0.

I was also able to import it when I loaded python/2.7.9.

marksantcroos commented 8 years ago

So is this is conflicting requirement or a missing requirement?

iparask commented 8 years ago

It is a conflicting one.

andre-merzky commented 8 years ago

With that you mean that you can't load the intel15 module because that conflicts with other parts, did I get this right? With what is it conflicting? Can we set LD_LIBRARY_PATH manually in pre-exec to point to the libzmq, or is tat breaking things just like loading intel15?

I am not really sure what to do at this point... We can force the bootstrapper's pip install to ignore existing modules, but that would create a significant overhead for all pilots (we can't do that per resource)...

iparask commented 8 years ago

In order for the virtualenv to update correctly all the time and not get errors like the above, I had to delete the ve from wrnagler and remove the dependency from the setup file of RADICAL-Pilot

andre-merzky commented 8 years ago

Does that mean we have a working config on wrangler now?

andre-merzky commented 8 years ago

ping?

andre-merzky commented 8 years ago

@iparask ping? Last call before closing.

andre-merzky commented 7 years ago

closing as (hopefully) fixed.