htcondor / htmap

High-Throughput Computing in Python, powered by HTCondor
https://htmap.readthedocs.io
Apache License 2.0
32 stars 10 forks source link

ModuleNotFound in node where job lands #236

Closed maxgalli closed 3 years ago

maxgalli commented 3 years ago

This issue is partially related to https://github.com/htcondor/htmap/issues/234. Thanks to @luisfdez, htmap now seems to work on lxplus. However, I'm getting an error that I don't fully understand when I try to run this script.

As you can see, the function that I feed htmap.map with is dummy_extractor, which inside uses the method uproot.open. When I run the script, I get the following:

$ python3 htmap_dummy_samples.py --base-dir /eos/home-g/gallim/samples/dummy_samples                                                                                     [294/1832]
Found 1 files in /eos/home-g/gallim/samples/dummy_samples
[CREDD] Adding user credentials to credd daemon
ripe-quick-robe:   0%|                                                                                                                                | 0/1 [11:34<?, ?component/s]
ripe-quick-robe:   0%|                                                                                                                                | 0/1 [11:37<?, ?component/s]
ripe-quick-robe:   0%|                                                                                                                                | 0/1 [16:46<?, ?component/s]
Traceback (most recent call last):
  File "htmap_dummy_samples.py", line 65, in <module>
    main(args)
  File "htmap_dummy_samples.py", line 53, in main
    mapped_arrays.wait(show_progress_bar = True)
  File "/afs/cern.ch/user/g/gallim/.local/lib/python3.6/site-packages/htmap-0.6.1-py3.6.egg/htmap/maps.py", line 45, in _protect
    return method(self, *args, **kwargs)
  File "/afs/cern.ch/user/g/gallim/.local/lib/python3.6/site-packages/htmap-0.6.1-py3.6.egg/htmap/maps.py", line 279, in wait
    f"Component {component} of map {self.tag} encountered error while executing. Error report:\n{self._load_error(component).report()}"
htmap.exceptions.MapComponentError: Component 0 of map ripe-quick-robe encountered error while executing. Error report:
=========  Start error report for component 0 of map ripe-quick-robe  ==========
Landed on execute node gallim-3085709.0-b7g28n2868.cern.ch (172.17.0.2) at 2021-06-02 13:35:20.042379

Python executable is /opt/conda/bin/python3 (version 3.7.3)
with installed packages
  alabaster==0.7.12
  anaconda-client==1.7.2                                                                                                                                                 [272/1832]
  anaconda-navigator==1.9.7
  anaconda-project==0.8.3
  asn1crypto==0.24.0
  astroid==2.2.5
  astropy==3.2.1
  atomicwrites==1.3.0
  attrs==19.1.0
  Babel==2.7.0
  backcall==0.1.0
  backports.functools-lru-cache==1.5
  backports.os==0.1.1
  backports.shutil-get-terminal-size==1.0.0
  backports.tempfile==1.0
  backports.weakref==1.0.post1
  beautifulsoup4==4.7.1
  bitarray==0.9.3
  bkcharts==0.2
  bleach==3.1.0
  bokeh==1.2.0
  boto==2.49.0
  Bottleneck==1.2.1
  certifi==2019.6.16                                                                                                                                                     [250/1832]
  cffi==1.12.3
  chardet==3.0.4
  Click==7.0
  click-didyoumean==0.0.3
  cloudpickle==1.5.0
  clyent==1.2.2
  colorama==0.4.1
  conda==4.7.10
  conda-build==3.18.8
  conda-package-handling==1.3.11
  conda-verify==3.4.2
  contextlib2==0.5.5
  cryptography==2.7
  cycler==0.10.0
  Cython==0.29.12
  cytoolz==0.10.0
  dask==2.1.0
  decorator==4.4.0
  defusedxml==0.6.0
  distributed==2.1.0
  docutils==0.14
  entrypoints==0.3                                                                                                                                                       [228/1832]
  et-xmlfile==1.0.1
  fastcache==1.1.0
  filelock==3.0.12
  Flask==1.1.1
  future==0.17.1
  gevent==1.4.0
  glob2==0.7
  gmpy2==2.0.8
  greenlet==0.4.15
  h5py==2.9.0
  halo==0.0.30
  heapdict==1.0.0
  htcondor==8.9.8
  htmap==0.6.1
  html5lib==1.0.1
  idna==2.8
  imageio==2.5.0
  imagesize==1.1.0
  importlib-metadata==1.7.0
  ipykernel==5.1.1
  ipython==7.6.1
  ipython-genutils==0.2.0                                                                                                                                                [206/1832]
  ipywidgets==7.5.0
  isort==4.3.21
  itsdangerous==1.1.0
  jdcal==1.4.1
  jedi==0.13.3
  jeepney==0.4
  Jinja2==2.10.1
  joblib==0.13.2
  json5==0.8.4
  jsonschema==3.0.1
  jupyter==1.0.0
  jupyter-client==5.3.1
  jupyter-console==6.0.0
  jupyter-core==4.5.0
  jupyterlab==1.0.2
  jupyterlab-server==1.0.0
  keyring==18.0.0
  kiwisolver==1.1.0
  lazy-object-proxy==1.4.1
  libarchive-c==2.8
  lief==0.9.0
  llvmlite==0.29.0                                                                                                                                                       [184/1832]
  locket==0.2.0
  log-symbols==0.0.14
  lxml==4.3.4
  MarkupSafe==1.1.1
  matplotlib==3.1.0
  mccabe==0.6.1
  mistune==0.8.4
  mkl-fft==1.0.12
  mkl-random==1.0.2
  mkl-service==2.0.2
  mock==3.0.5
  more-itertools==7.0.0
  mpmath==1.1.0
  msgpack==0.6.1
  multipledispatch==0.6.0
  navigator-updater==0.2.1
  nbconvert==5.5.0
  nbformat==4.4.0
  networkx==2.3
  nltk==3.4.4
  nose==1.3.7
  notebook==6.0.0                                                                                                                                                        [162/1832]
  numba==0.44.1
  numexpr==2.6.9
  numpy==1.16.4
  numpydoc==0.9.1
  olefile==0.46
  openpyxl==2.6.2
  packaging==19.0
  pandas==0.24.2
  pandocfilters==1.4.2
  parso==0.5.0
  partd==1.0.0
  path.py==12.0.1
  pathlib2==2.3.4
  patsy==0.5.1
  pep8==1.7.1
  pexpect==4.7.0
  pickleshare==0.7.5
  Pillow==6.1.0
  pkginfo==1.5.0.1
  pluggy==0.12.0
  ply==3.11
  prometheus-client==0.7.1                                                                                                                                               [140/1832]
  prompt-toolkit==2.0.9
  psutil==5.6.3
  ptyprocess==0.6.0
  py==1.8.0
  pycodestyle==2.5.0
  pycosat==0.6.3
  pycparser==2.19
  pycrypto==2.6.1
  pycurl==7.43.0.3
  pyflakes==2.1.1
  Pygments==2.4.2
  pylint==2.3.1
  pyodbc==4.0.26
  pyOpenSSL==19.0.0
  pyparsing==2.4.0
  pyrsistent==0.14.11
  PySocks==1.7.0
  pytest==5.0.1
  pytest-arraydiff==0.3
  pytest-astropy==0.5.0
  pytest-doctestplus==0.3.0
  pytest-openfiles==0.3.2                                                                                                                                                [120/1834]
  pytest-remotedata==0.3.1
  python-dateutil==2.8.0
  pytz==2019.1
  PyWavelets==1.0.3
  PyYAML==5.1.1
  pyzmq==18.0.0
  QtAwesome==0.5.7
  qtconsole==4.5.1
  QtPy==1.8.0
  requests==2.22.0
  rope==0.14.0
  ruamel-yaml==0.15.46
  scikit-image==0.15.0
  scikit-learn==0.21.2
  scipy==1.3.0
  seaborn==0.9.0
  SecretStorage==3.1.1
  Send2Trash==1.5.0
  simplegeneric==0.8.1
  singledispatch==3.4.0.3
  six==1.12.0
  snowballstemmer==1.9.0                                                                                                                                                  [98/1834]
  sortedcollections==1.1.2
  sortedcontainers==2.1.0
  soupsieve==1.8
  Sphinx==2.1.2
  sphinxcontrib-applehelp==1.0.1
  sphinxcontrib-devhelp==1.0.1
  sphinxcontrib-htmlhelp==1.0.2
  sphinxcontrib-jsmath==1.0.1
  sphinxcontrib-qthelp==1.0.2
  sphinxcontrib-serializinghtml==1.1.3
  sphinxcontrib-websupport==1.1.2
  spinners==0.0.24
  spyder==3.3.6
  spyder-kernels==0.5.1
  SQLAlchemy==1.3.5
  statsmodels==0.10.0
  sympy==1.4
  tables==3.5.2
  tblib==1.4.0
  termcolor==1.1.0
  terminado==0.8.2
  testpath==0.4.2                                                                                                                                                         [76/1834]
  toml==0.10.1
  toolz==0.10.0
  tornado==6.0.3
  tqdm==4.48.2
  traitlets==4.3.2
  unicodecsv==0.14.1
  urllib3==1.24.2
  wcwidth==0.1.7
  webencodings==0.5.1
  Werkzeug==0.15.4
  widgetsnbextension==3.5.0
  wrapt==1.11.2
  wurlitzer==1.0.2
  xlrd==1.2.0
  XlsxWriter==1.1.8
  xlwt==1.3.0
  zict==1.0.0
  zipp==0.5.1

Scratch directory contents are
  /pool/condor/dir_32234/_htmap_transfer_plugin_cache
  /pool/condor/dir_32234/func                                                                                                                                             [54/1834]
  /pool/condor/dir_32234/condor_exec.exe
  /pool/condor/dir_32234/var
  /pool/condor/dir_32234/_condor_stdout
  /pool/condor/dir_32234/pool
  /pool/condor/dir_32234/.job.ad
  /pool/condor/dir_32234/tmp
  /pool/condor/dir_32234/_condor_stderr
  /pool/condor/dir_32234/.machine.ad
  /pool/condor/dir_32234/gallim.cc
  /pool/condor/dir_32234/docker_stderror
  /pool/condor/dir_32234/.docker_sock
  /pool/condor/dir_32234/_htmap_transfer
  /pool/condor/dir_32234/0.in
  /pool/condor/dir_32234/.chirp.config
  /pool/condor/dir_32234/_htmap_do_output_transfer
  /pool/condor/dir_32234/.update.ad
  /pool/condor/dir_32234/_htmap_user_transfer

Exception and traceback (most recent call last):
  File "./condor_exec.exe", line 163, in load_func
    return load_object(Path("func"))
  /pool/condor/dir_32234/func                                                                                                                                             [54/1834]
  /pool/condor/dir_32234/condor_exec.exe
  /pool/condor/dir_32234/var
  /pool/condor/dir_32234/_condor_stdout
  /pool/condor/dir_32234/pool
  /pool/condor/dir_32234/.job.ad
  /pool/condor/dir_32234/tmp
  /pool/condor/dir_32234/_condor_stderr
  /pool/condor/dir_32234/.machine.ad
  /pool/condor/dir_32234/gallim.cc
  /pool/condor/dir_32234/docker_stderror
  /pool/condor/dir_32234/.docker_sock
  /pool/condor/dir_32234/_htmap_transfer
  /pool/condor/dir_32234/0.in
  /pool/condor/dir_32234/.chirp.config
  /pool/condor/dir_32234/_htmap_do_output_transfer
  /pool/condor/dir_32234/.update.ad
  /pool/condor/dir_32234/_htmap_user_transfer

Exception and traceback (most recent call last):
  File "./condor_exec.exe", line 163, in load_func
    return load_object(Path("func"))
                                                                                                                                                                          [32/1834]
    Local variables:

  File "./condor_exec.exe", line 159, in load_object
    return cloudpickle.load(file)

    Local variables:
      cloudpickle = <module 'cloudpickle ... pickle/__init__.py'>
      file = <gzip on 0x7f268aa800b8>
      path = PosixPath('func')

  File "/opt/conda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 562, in subimport
    __import__(name)

    Local variables:
      name = 'uproot'

  ModuleNotFoundError: No module named 'uproot'

==========  End error report for component 0 of map ripe-quick-robe  ===========

Setup

The setup I'm using consists in a patched version of htmap (installed with /usr/bin/python3 setup.py install --user after cloning this branch) while uproot was as well installed manually (since it is not installed in the default directories and I don't have privileges to install it there) and its path is the following:

>>> import uproot
>>> uproot.__file__
'/afs/cern.ch/user/g/gallim/.local/lib/python3.6/site-packages/uproot/__init__.py'

What I tried

As you can see, in the list of the installed packages there is no uproot. However, if I run python3 -m pip freeze --disable-pip-version-check, it is shown. I also tried the same procedure within a conda environment, with htmap installed in the same way I described, and I get the same problem. Another thing I tried was to modify the function that I feed htmap.map with, by not calling uproot.open and simply make it return an int (but, I point out, keeping import uproot at the beginning of the script). In this case I don't see the error.

Do you know any idea of what I could try? Please let me know if you need some more details.

maxgalli commented 3 years ago

@luisfdez suggestion to add htmap.settings["DELIVERY_METHOD"] = "assume" worked, hence closing