radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Failed `prepare_env` due to missing env setup variables (`conda`-related) #2952

Closed mtitov closed 1 year ago

mtitov commented 1 year ago

RADICAL stack 1.34-devel Using conda environment for runs on Polaris

agent.0.log

...
1685564806.938 : agent.0              : 20197 : 22862698084096 : ERROR    : worker thread initialization failed
Traceback (most recent call last):
  File "/home/matitov/.conda/envs/rct/lib/python3.9/site-packages/radical/pilot/utils/component.py", line 488, in _worker_thread
    self._initialize()
  File "/home/matitov/.conda/envs/rct/lib/python3.9/site-packages/radical/pilot/utils/component.py", line 632, in _initialize
    self.initialize()
  File "/home/matitov/.conda/envs/rct/lib/python3.9/site-packages/radical/pilot/agent/agent_0.py", line 228, in initialize
    self._prepare_env('rp', env_spec)
  File "/home/matitov/.conda/envs/rct/lib/python3.9/site-packages/radical/pilot/agent/agent_0.py", line 864, in _prepare_env
    raise RuntimeError('prepare_env failed: \n%s\n%s\n' % (out, err))
RuntimeError: prepare_env failed: 

env.log

[/home/matitov/.conda/envs/rct/bin/radical-pilot-create-static-ve -d -p /home/matitov/.conda/envs/rct -t conda -P . env/bs0_pre_0.sh -P export PYTHONPATH=/home/matitov/.conda/envs/rct/rp_install/lib/python3.9/site-packages:/home/matitov/.conda/envs/rct/lib/python3.9/site-packages: -P export PATH=/home/matitov/.conda/envs/rct/rp_install/bin:/home/matitov/.conda/envs/rct/bin:/home/matitov/.conda/envs/rct/bin:/soft/datascience/conda/2022-09-08/mconda3/condabin:/soft/compilers/cudatoolkit/cuda-11.6.2/bin:/soft/libraries/nccl/nccl_2.14.3-1+cuda11.6_x86_64/include:/opt/cray/pe/pals/1.1.7/bin:/opt/cray/pe/craype/2.7.15/bin:/opt/cray/pe/gcc/11.2.0/bin:/opt/cray/pe/perftools/22.05.0/bin:/opt/cray/pe/papi/6.0.0.14/bin:/opt/cray/libfabric/1.11.0.4.125/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/local/bin:/usr/bin:/bin:/opt/c3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/pbs/bin:/sbin:/home/matitov/.local/bin:/home/matitov/bin:/opt/cray/pe/bin]
pre exec: . env/bs0_pre_0.sh
pre exec done
pre exec: export PYTHONPATH=/home/matitov/.conda/envs/rct/rp_install/lib/python3.9/site-packages:/home/matitov/.conda/envs/rct/lib/python3.9/site-packages:
pre exec done
pre exec: export PATH=/home/matitov/.conda/envs/rct/rp_install/bin:/home/matitov/.conda/envs/rct/bin:/home/matitov/.conda/envs/rct/bin:/soft/datascience/conda/2022-09-08/mconda3/condabin:/soft/compilers/cudatoolkit/cuda-11.6.2/bin:/soft/libraries/nccl/nccl_2.14.3-1+cuda11.6_x86_64/include:/opt/cray/pe/pals/1.1.7/bin:/opt/cray/pe/craype/2.7.15/bin:/opt/cray/pe/gcc/11.2.0/bin:/opt/cray/pe/perftools/22.05.0/bin:/opt/cray/pe/papi/6.0.0.14/bin:/opt/cray/libfabric/1.11.0.4.125/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/usr/local/bin:/usr/bin:/bin:/opt/c3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/pbs/bin:/sbin:/home/matitov/.local/bin:/home/matitov/bin:/opt/cray/pe/bin
pre exec done
/soft/datascience/conda/2022-09-08/mconda3/condabin/conda
env/bs0_pre_0.sh: line 313: __conda_exe: command not found
env/bs0_pre_0.sh: line 296: __conda_exe: command not found

Attached bs0_pre_0.sh and current env after activating conda (module load conda; eval "$(conda shell.posix hook)"; conda activate rct) - env.txt bs0_pre_0.sh.txt env.txt

mtitov commented 1 year ago

Issue is due to that Polaris didn't export bash function __conda_exe, that is located in set only

__conda_activate () 
{ 
    if [ -n "${CONDA_PS1_BACKUP:+x}" ]; then
        PS1="$CONDA_PS1_BACKUP";
        \unset CONDA_PS1_BACKUP;
    fi;
    \local ask_conda;
    ask_conda="$(PS1="${PS1:-}" __conda_exe shell.posix "$@")" || \return;
    \eval "$ask_conda";
    __conda_hashr
}
__conda_exe () 
{ 
    ( __add_sys_prefix_to_path;
    "$CONDA_EXE" $_CE_M $_CE_CONDA "$@" )
}

To compare with Summit - it doesn't use bash function __conda_exe and uses env variable instead

__conda_activate () 
{ 
    if [ -n "${CONDA_PS1_BACKUP:+x}" ]; then
        PS1="$CONDA_PS1_BACKUP";
        \unset CONDA_PS1_BACKUP;
    fi;
    \local cmd="$1";
    shift;
    \local ask_conda;
    CONDA_INTERNAL_OLDPATH="${PATH}";
    __add_sys_prefix_to_path;
    ask_conda="$(PS1="$PS1" "$CONDA_EXE" $_CE_M $_CE_CONDA shell.posix "$cmd" "$@")" || \return $?;
    rc=$?;
    PATH="${CONDA_INTERNAL_OLDPATH}";
    \eval "$ask_conda";
    if [ $rc != 0 ]; then
        \export PATH;
    fi;
    __conda_hashr
}

(*) could be the case of the conda version and the way system loads conda using module

@andre-merzky proposed to ensure that conda-related bash functions are exported (see the corresponding PR)

for name in $(set | grep -e '^[^ ]*conda[^ ]* ()' | cut -f 1 -d ' ')
    do
        export -f $name
    done