Closed mturilli closed 5 years ago
Sorry for my delay.
I have updated the package but I received the following problem when I tried to run entk-version
.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/entk/__init__.py", line 4, in <module>
from radical.entk.pipeline.pipeline import Pipeline
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/entk/pipeline/pipeline.py", line 1, in <module>
import radical.utils as ru
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/__init__.py", line 14, in <module>
from .plugin_manager import PluginManager
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/plugin_manager.py", line 14, in <module>
from .logger import Logger
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/logger.py", line 45, in <module>
from .misc import get_env_ns as ru_get_env_ns
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/misc.py", line 13, in <module>
from .ru_regex import ReString
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/ru_regex.py", line 7, in <module>
import regex
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py", line 1, in <module>
from .regex import *
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py", line 391, in <module>
import _regex_core
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py", line 21, in <module>
import _regex
ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
I also tried to delete my old virtual environment and start a new one but the same error persists.
I tried to run radical-stack
, but also ended up with the same error ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
.
This is surprising. Could it be that the python version used when installing the virtualenv is different than the one used when running the above? Like, different module loaded, or a different compiler module loaded?
But I have purged my modules and recreated a new virtual environment.
This is the script I used to install packages.
Thank you
Do you also do the module purge && module load python/2.7.15
when you use that virtualenv and run the code?
No. After I activated the virtualenv, I didn't purge modules again and load python. I just tried to purge and load again after activating the virtualenv, and then it is not able to find any packages, including entk packages.
This is also unexpected. I would guess that the installation did not go into that VE possibly. Can you check if it ended up somewhere under $HOME/.local/lib/
?
Either way though, I would recommend to start from scratch, and to include entk installation and the verification in your deployment script:
# prepare VE
module purge && module load python/2.7.15
virtualenv venv
source venv/bin/activate
# Install entk and dependencies
pip install radical.entk pyyaml netcdf4
# replace RP version
git clone https://github.com/radical-cybertools/radical.pilot.git
cd radical.pilot
git checkout fix/cheyenne
pip uninstall -y radical.pilot
pip install .
# verify installation
python -V
radical-stack
Looks like I don't have the folder $HOME/.local/lib/
.
I tried to start a new session, and use the script to create a new virtualenv. But I still get the errors.
(venv) wuh20@cheyenne2:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> python -V
Python 2.7.15
(venv) wuh20@cheyenne2:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> radical-stack
Traceback (most recent call last):
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/bin/radical-stack", line 3, in <module>
import radical.utils as ru
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/__init__.py", line 14, in <module>
from .plugin_manager import PluginManager
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/plugin_manager.py", line 14, in <module>
from .logger import Logger
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/logger.py", line 45, in <module>
from .misc import get_env_ns as ru_get_env_ns
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/misc.py", line 13, in <module>
from .ru_regex import ReString
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/ru_regex.py", line 7, in <module>
import regex
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py", line 1, in <module>
from .regex import *
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py", line 391, in <module>
import _regex_core
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py", line 21, in <module>
import _regex
ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
Again, after the installation, I still don't have the folder $HOME/.local/lib/
.
Thank you.
I'll be back on my computer in about 30 min. Can you send me a module list
you get after a fresh login, and possibly also after the module load
in the install script you use? I will try to reproduce this.
Thank you very much.
wuh20@sapphire:~$ cheyenne
Last login: Wed May 1 09:56:40 2019 from 128.118.54.223
******************************************************************************
* Welcome to Cheyenne - April 30, 2019
******************************************************************************
Today in the Daily Bulletin (dailyb.cisl.ucar.edu)
- Reminder: Cheyenne compute nodes down May 6-11 during NWSC electrical repairs
- Alternative HPC login nodes now available
- Default Cheyenne and Casper libraries will be updated May 6
- Tutorial for new Cheyenne and Casper users
- Best practice: Use scratch space for temporary files
Quick Start: www2.cisl.ucar.edu/resources/cheyenne/quick-start-cheyenne
User environment: www2.cisl.ucar.edu/resources/cheyenne/user-environment
Key module commands: module list, module avail, module spider, module help
CISL Help: cislhelp@ucar.edu -- 303-497-2400
------------------------------------------------------------------------------------
Restoring modules from user's default, for system: "ch"
wuh20@cheyenne2:~> module list
Currently Loaded Modules:
1) ncarenv/1.2 2) intel/17.0.1 3) ncarcompilers/0.4.1 4) mpt/2.19 5) netcdf/4.6.1
wuh20@cheyenne2:~> module purge && module load python/2.7.15
wuh20@cheyenne2:~> module list
Currently Loaded Modules:
1) python/2.7.15
Thank you!
There is something funny going on with your account I think. This is what your procedure looks for me (I shortened insuspicious output):
$ ssh cheyenne
Token_Response:
Last login: Mon May 7 03:25:26 2018 from 138.201.86.166
...
Resetting modules to system default
cheyenne4 amerzky ~ $ module list
Currently Loaded Modules:
1) ncarenv/1.2 2) intel/17.0.1 3) ncarcompilers/0.4.1 4) mpt/2.19 5) netcdf/4.6.1
cheyenne4 amerzky ~ $ module purge && module load python/2.7.15
cheyenne4 amerzky ~ $ module liist
Currently Loaded Modules:
1) python/2.7.15
cheyenne4 amerzky ~ $ virtualenv ve > /dev/null
cheyenne4 amerzky ~ $ source ve/bin/activate
(ve) cheyenne4 amerzky ~ $ pip install radical.entk > /dev/null
...
(ve) cheyenne4 amerzky ~ $ entk-version
0.7.16
(ve) cheyenne4 amerzky ~ $ radical-stack
python : 2.7.15
pythonpath :
virtualenv : /gpfs/u/home/amerzky/ve
radical.entk : 0.7.16
radical.pilot : 0.60.1
radical.saga : 0.60.0
radical.utils : 0.60.1
The installation of the RP branch also does not make a difference:
(ve) cheyenne4 amerzky ~ $ git clone https://github.com/radical-cybertools/radical.pilot.git
Cloning into 'radical.pilot'...
...
(ve) cheyenne4 amerzky ~ $ cd radical.pilot
(ve) cheyenne4 amerzky ~/radical.pilot [devel] $ git checkout fix/cheyenne
Branch fix/cheyenne set up to track remote branch fix/cheyenne from origin.
Switched to a new branch 'fix/cheyenne'
(ve) cheyenne4 amerzky ~/radical.pilot [fix/cheyenne] $ pip install . --upgrade
...
Successfully installed radical.pilot-0.60.1
(ve) cheyenne4 amerzky ~/radical.pilot [fix/cheyenne] $ radical-stack
python : 2.7.15
pythonpath :
virtualenv : /gpfs/u/home/amerzky/ve
radical.entk : 0.7.16
radical.pilot : 0.60.1-v0.60.1-7-g25bcc08@fix-cheyenne
radical.saga : 0.60.0
radical.utils : 0.60.1
Can you please send the result of:
$ ldd /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so
Can you please also send me the output like below, please:
(ve) cheyenne4 amerzky $ python -v -c 'import regex' 2>&1 | grep -C 3 regex
Python 2.7.15 (default, Jan 11 2019, 15:22:07)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import regex # directory /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex
# /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/__init__.pyc matches /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/__init__.py
import regex # precompiled from /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/__init__.pyc
# /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/regex.pyc matches /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/regex.py
import regex.regex # precompiled from /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/regex.pyc
# /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/_regex_core.pyc matches /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/_regex_core.py
import regex._regex_core # precompiled from /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/_regex_core.pyc
# /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.pyc matches /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.py
import string # precompiled from /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.pyc
# /gpfs/u/home/amerzky/ve/lib/python2.7/re.pyc matches /gpfs/u/home/amerzky/ve/lib/python2.7/re.py
--
dlopen("/gpfs/u/home/amerzky/ve/lib/python2.7/lib-dynload/_heapq.so", 2);
import _heapq # dynamically loaded from /gpfs/u/home/amerzky/ve/lib/python2.7/lib-dynload/_heapq.so
import thread # builtin
dlopen("/gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/_regex.so", 2);
import regex._regex # dynamically loaded from /gpfs/u/home/amerzky/ve/lib/python2.7/site-packages/regex/_regex.so
# /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/threading.pyc matches /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/threading.py
import threading # precompiled from /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/threading.pyc
dlopen("/gpfs/u/home/amerzky/ve/lib/python2.7/lib-dynload/time.so", 2);
--
...
Here is the output.
(venv) wuh20@cheyenne6:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> ldd /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so
linux-vdso.so.1 (0x00007fffedb05000)
libm.so.6 => /glade/u/apps/ch/os/lib64/libm.so.6 (0x00007fffed554000)
libdl.so.2 => /glade/u/apps/ch/os/lib64/libdl.so.2 (0x00007fffed350000)
librt.so.1 => /glade/u/apps/ch/os/lib64/librt.so.1 (0x00007fffed147000)
libpthread.so.0 => /glade/u/apps/ch/os/lib64/libpthread.so.0 (0x00007fffecf2a000)
libc.so.6 => /glade/u/apps/ch/os/lib64/libc.so.6 (0x00007fffecb82000)
/gpfs/u/home/wuh20/.linuxbrew/Cellar/glibc/2.23/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
(venv) wuh20@cheyenne6:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> python -v -c 'import regex' 2>&1 | grep -C 3 regex
Python 2.7.15 (default, Jan 11 2019, 15:22:07)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import regex # directory /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex
# /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.pyc matches /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py
import regex # precompiled from /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.pyc
# /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.pyc matches /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py
import regex.regex # precompiled from /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.pyc
# /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.pyc matches /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py
import regex._regex_core # precompiled from /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.pyc
# /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.pyc matches /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.py
import string # precompiled from /glade/u/apps/ch/opt/python/2.7.15/gnu/7.3.0/lib/python2.7/string.pyc
# /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/re.pyc matches /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/re.py
--
dlopen("/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/lib-dynload/_heapq.so", 2);
import _heapq # dynamically loaded from /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/lib-dynload/_heapq.so
import thread # builtin
dlopen("/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so", 2);
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py", line 1, in <module>
from .regex import *
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py", line 391, in <module>
import _regex_core
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py", line 21, in <module>
import _regex
ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
# clear __builtin__._
# clear sys.path
# clear sys.argv
Any idea what this is:
/gpfs/u/home/wuh20/.linuxbrew/Cellar/glibc/2.23/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
This is likely the culprit. My output of the ldd is:
(ve) cheyenne4 amerzky ~ $ ldd ve/lib/python2.7/site-packages/regex/_regex.so
linux-vdso.so.1 (0x00007fffedb05000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fffed63d000)
libc.so.6 => /lib64/libc.so.6 (0x00007fffed294000)
/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
The glibc used in your case likely has not been compiled with the default intel compile chain.
If that glibc is needed by you workload, you could try to :
module purge
module load gcc
module load python/2.7.15
and see if the deployment is more forgiving to that libc?
libc
-> ld-linux
...
I installed linuxbrew
a while ago. I guess it has built in some libraries and packages that mess up with my environment. I have just removed it. The result of ldd
changed, but still, I'm having the problem.
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/radical.pilot> ldd /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so
linux-vdso.so.1 (0x00007fffedb05000)
libm.so.6 => /glade/u/apps/ch/os/lib64/libm.so.6 (0x00007fffed554000)
libdl.so.2 => /glade/u/apps/ch/os/lib64/libdl.so.2 (0x00007fffed350000)
librt.so.1 => /glade/u/apps/ch/os/lib64/librt.so.1 (0x00007fffed148000)
libpthread.so.0 => /glade/u/apps/ch/os/lib64/libpthread.so.0 (0x00007fffecf2a000)
libc.so.6 => /glade/u/apps/ch/os/lib64/libc.so.6 (0x00007fffecb82000)
/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/radical.pilot> entk-version
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/entk/__init__.py", line 4, in <module>
from radical.entk.pipeline.pipeline import Pipeline
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/entk/pipeline/pipeline.py", line 1, in <module>
import radical.utils as ru
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/__init__.py", line 14, in <module>
from .plugin_manager import PluginManager
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/plugin_manager.py", line 14, in <module>
from .logger import Logger
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/logger.py", line 45, in <module>
from .misc import get_env_ns as ru_get_env_ns
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/misc.py", line 13, in <module>
from .ru_regex import ReString
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/ru_regex.py", line 7, in <module>
import regex
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py", line 1, in <module>
from .regex import *
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py", line 391, in <module>
import _regex_core
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py", line 21, in <module>
import _regex
ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/radical.pilot>
I plan to work with Cheyenne sysmin to address my environment setup first. Hope it will address some of these issues.
Could you send me the setting of $LD_LIBRARY_PATH
just before you run the radical-stack
command?
(venv) wuh20@cheyenne3:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> echo $LD_LIBRARY_PATH
/glade/u/apps/ch/opt/mpt_fmods/2.19/intel/17.0.1:/glade/u/apps/ch/opt/mpt/2.19/opt/hpe/hpc/mpt/mpt-2.19/lib:/glade/u/apps/opt/intel/2017u1/compilers_and_libraries/linux/lib/intel64_lin:/ncar/opt/slurm/latest/lib::/glade/u/apps/ch/os/usr/lib64:/glade/u/apps/ch/os/usr/lib:/glade/u/apps/ch/os/lib64:/glade/u/apps/ch/os/lib
:-P
(ve) cheyenne4 amerzky ~ $ echo $LD_LIBRARY_PATH
/usr/local/lib
(ve) cheyenne4 amerzky ~ $ radical-stack
python : 2.7.15
pythonpath :
virtualenv : /gpfs/u/home/amerzky/ve
radical.entk : 0.7.16
radical.pilot : 0.60.1-v0.60.1-7-g25bcc08@fix-cheyenne
radical.saga : 0.60.0
radical.utils : 0.60.1
I have resolved the default module issue. My default file in .lmod.d
has been mysteriously changed. I have reverted it to the correct default. This should take care of it.
wuh20@sapphire:~$ cheyenne
Last login: Wed May 1 15:32:33 2019 from 128.118.54.223
******************************************************************************
* Welcome to Cheyenne - April 30, 2019
******************************************************************************
Today in the Daily Bulletin (dailyb.cisl.ucar.edu)
- Reminder: Cheyenne compute nodes down May 6-11 during NWSC electrical repairs
- Alternative HPC login nodes now available
- Default Cheyenne and Casper libraries will be updated May 6
- Tutorial for new Cheyenne and Casper users
- Best practice: Use scratch space for temporary files
Quick Start: www2.cisl.ucar.edu/resources/cheyenne/quick-start-cheyenne
User environment: www2.cisl.ucar.edu/resources/cheyenne/user-environment
Key module commands: module list, module avail, module spider, module help
CISL Help: cislhelp@ucar.edu -- 303-497-2400
------------------------------------------------------------------------------------
Resetting modules to system default
wuh20@cheyenne4:~> module list
Currently Loaded Modules:
1) ncarenv/1.2 2) intel/17.0.1 3) ncarcompilers/0.4.1 4) mpt/2.19 5) netcdf/4.6.1
But I repeat the process of creating the virtual environment, I still have the error and things don't seem to change. My LD_LIBRARY_PATH
is still different from yours. Even if I change it to yours, I get the error anyway.
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> export LD_LIBRARY_PATH=/usr/local/lib
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> echo $LD_LIBRARY_PATH
/usr/local/lib
(venv) wuh20@cheyenne4:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> radical-stack
Traceback (most recent call last):
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/bin/radical-stack", line 3, in <module>
import radical.utils as ru
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/__init__.py", line 14, in <module>
from .plugin_manager import PluginManager
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/plugin_manager.py", line 14, in <module>
from .logger import Logger
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/logger.py", line 45, in <module>
from .misc import get_env_ns as ru_get_env_ns
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/misc.py", line 13, in <module>
from .ru_regex import ReString
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/radical/utils/ru_regex.py", line 7, in <module>
import regex
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/__init__.py", line 1, in <module>
from .regex import *
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/regex.py", line 391, in <module>
import _regex_core
File "/gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex_core.py", line 21, in <module>
import _regex
ImportError: /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv/lib/python2.7/site-packages/regex/_regex.so: undefined symbol: _intel_fast_memcpy
I'm kind of running out of ideas here....
This is fascinating... Would you mind posting the following files, if you have them? .bashrc .profile .login
. What is in your .lmod.d
?
Here it is.
wuh20@cheyenne4:~> cat .bash_profile
# CAnEn
export PATH=/glade/u/home/wuh20/github/AnalogsEnsemble/output/bin:$PATH
export PATH=/glade/u/home/wuh20/packages/grib2/wgrib2:$PATH
export PATH=/glade/u/home/wuh20/github/AnalogsEnsemble/dependency/install/bin:$PATH
export LANG=en_US
# CMake
alias cmake=/glade/u/home/wuh20/packages/cmake-3.10.1/bin/cmake
export TMPDIR=/glade/scratch/wuh20
wuh20@cheyenne4:~> cat .profile
# eval $(/glade/u/home/wuh20/.linuxbrew/bin/brew shellenv)
wuh20@cheyenne4:~> cat .login
cat: .login: No such file or directory
wuh20@cheyenne4:~> ls .lmod.d/
jasper.ch R.ch
Hmm, your jasper.ch refers to the gnu compiler still. What happens if you retry after `mv .lmod.d .lmod.d.bak'?
OK. I think I have fixed this brutally. I moved all the ambiguous hidden files/folders in my home directory to somewhere else. I was almost sure it must be something messing up with my environment. And then tried again. It is working now. I guess the more import thing is to remove .local
folder where all python packages are. Virtual environment always reuse the regex
package from this local folder since it has been built before. If I remove the folder, virtual environment has to build it again, which resolved the linking issue, though this is my mere conjecture.
(venv) wuh20@cheyenne3:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> ldd venv/lib/python2.7/site-packages/regex/_regex.so
linux-vdso.so.1 (0x00007fffedb05000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fffed63d000)
libc.so.6 => /lib64/libc.so.6 (0x00007fffed294000)
/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
(venv) wuh20@cheyenne3:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> entk-version
0.7.16
(venv) wuh20@cheyenne3:~/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node> radical-stack
python : 2.7.15
pythonpath :
virtualenv : /gpfs/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/venv
radical.entk : 0.7.16
radical.pilot : 0.60.1-v0.60.1-7-g25bcc08@fix-cheyenne
radical.saga : 0.60.0
radical.utils : 0.60.1
Thank you so much for your help! I'm going to give this new release a try shortly.
Oh, I am sorry we did not find this earlier! Can you remember where the regex lived under ~/.local
, for the next time a user stumbles over this? You did not have a ~/.local/lib
where I expected it, IIRC...
But yeah, the interferences between ~/.local
and pip / virtualenv is annoying, I stumbled over that a couple of times. Nowadays I do : rm -rf ~/.local; ln -s /dev/null ~/.local
which I consider a very polite way to tell pip to fuck off... ;-)
The EnTK is running now. But it looks like the process hangs for some reason.
EnTK session: re.session.cheyenne2.wuh20.018031.0001
Creating AppManager ok
Validating and assigning resource manager ok
Creating analog generation task task-anen-gen-00000
Adding task 1: task-anen-gen-00000
Creating analog generation task task-anen-gen-00001
Adding task 2: task-anen-gen-00001
Creating analog generation task task-anen-gen-00002
Adding task 3: task-anen-gen-00002
Creating analog generation task task-anen-gen-00003
Adding task 4: task-anen-gen-00003
Creating analog generation task task-anen-gen-00004
Adding task 5: task-anen-gen-00004
Creating analog generation task task-anen-gen-00005
Adding task 6: task-anen-gen-00005
Creating analog generation task task-anen-gen-00006
Adding task 7: task-anen-gen-00006
Creating analog generation task task-anen-gen-00007
Adding task 8: task-anen-gen-00007
Creating analog generation task task-anen-gen-00008
Adding task 9: task-anen-gen-00008
Creating analog generation task task-anen-gen-00009
Adding task 10: task-anen-gen-00009
Creating analog generation task task-anen-gen-00010
Adding task 11: task-anen-gen-00010
Creating analog generation task task-anen-gen-00011
Adding task 12: task-anen-gen-00011
Creating analog generation task task-anen-gen-00012
Adding task 13: task-anen-gen-00012
Creating analog generation task task-anen-gen-00013
Adding task 14: task-anen-gen-00013
Creating analog generation task task-anen-gen-00014
Adding task 15: task-anen-gen-00014
Creating analog generation task task-anen-gen-00015
Adding task 16: task-anen-gen-00015
Creating analog generation task task-anen-gen-00016
Adding task 17: task-anen-gen-00016
Creating analog generation task task-anen-gen-00017
Adding task 18: task-anen-gen-00017
Creating analog generation task task-anen-gen-00018
Adding task 19: task-anen-gen-00018
Creating analog generation task task-anen-gen-00019
Adding task 20: task-anen-gen-00019
Creating analog generation task task-anen-gen-00020
Adding task 21: task-anen-gen-00020
Creating analog generation task task-anen-gen-00021
Adding task 22: task-anen-gen-00021
Creating analog generation task task-anen-gen-00022
Adding task 23: task-anen-gen-00022
Creating analog generation task task-anen-gen-00023
Adding task 24: task-anen-gen-00023
Creating analog generation task task-anen-gen-00024
Adding task 25: task-anen-gen-00024
Creating analog generation task task-anen-gen-00025
Adding task 26: task-anen-gen-00025
Creating analog generation task task-anen-gen-00026
Adding task 27: task-anen-gen-00026
Creating analog generation task task-anen-gen-00027
Adding task 28: task-anen-gen-00027
Creating analog generation task task-anen-gen-00028
Adding task 29: task-anen-gen-00028
Creating analog generation task task-anen-gen-00029
Adding task 30: task-anen-gen-00029
Adding stage stage-anen-gen.
Setting up RabbitMQ system ok
ok
create pilot manager ok
submit 1 pilot(s)
[ncar.cheyenne:72]
ok
Here I have waited for a long time but it does not go through.
Thanks for the feedback! @Weiming-Hu, can you share the script you are running, and if possible also give access to the client and the pilot sandbox on Cheyenne?
Of course.
The script I'm running /glade/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node/runme.py
.
You should also have access to this folder /glade/u/home/wuh20/github/hpc-workflows/scripts/application_AnEn/year_2/multi-node
to access some log files.
Pilot sandbox is generated at /glade/scratch/wuh20/radical.pilot.sandbox
.
Thank you.
Hi @andre-merzky, sorry for not following up with this earlier. The process hangs for me after I invoke EnTK. Maybe it would be more convenient for both of us to have a chat sometime?
Hey @Weiming-Hu : yeah, sorry also from my end for not following up earlier. I am in the process of reproducing this problem. I had to recreate the pilot virtualenv on Cheyenne, and hope I have some more info until the call.
Good news: the pilot now gets submitted and runs again.
Not so good news: the client sees a segfault in Python because we hit some stack limit. This may be one of the problems we see on other machines, where the default stacksize for python threads is very large. I will have to look into this to confirm - if this is the problem, we can mitigate it. If it is a new system limit we are hitting, its less sure to be easily fixable.
According to Vivek, this requires a support ticket with NCAR to increase thread and process limits.
Thank you. Are you suggesting that I should submit a ticket to Cheyenne admin? I remembered that I have already asked them to increase my thread limit. Should I do it again?
I did submit a ticket - but did not yet get an reply. If you got your limit raised already and it still doesn't work, you are likely hung up on something different, or the limit was reset for some reason. Either way though, I won't be able to reproduce the problem until support replies... :/
Hey @Weiming-Hu, I don't think any action is needed from you. It is for @andre-merzky and myself to open that ticket asking for our thread limits to be increased.
Thank you for the clearification.
Closing because the workflow is not used anymore
I pushed some changes to the
fix/cheyenne
branch (in the mpirun_dplace launch method) which addresses core pinning for multithreaded tasks. Can you give this a try, please?