Closed ghost closed 7 years ago
You are doing the right thing, but you need to figure out how to initialize Dotkit properly. One thing you can try is to source ~/.bashrc, ~/.profile, ~/.bash_profile, etc. in the PROLOGUE instead.
Yeah, getting that PROLOGUE section right on HPC platforms is tricky. You may want to try running with hit build --debug
to drop in to the shell of the build, manually try to initialize Dotkit and then run /bin/bash ./hashdist/build.sh.
The script to initialize DotKit checks if the env variable $HOME exists, but it seems that it's not passed to the shell of the build.
if [ -n "$HOME" ]; then
if [ ! -f $HOME/.nodotkit ]; then
if [ -f /usr/local/tools/dotkit/dotkit/ksh/.dk_init ]; then
export DK_ROOT=/usr/local/tools/dotkit/dotkit
. $DK_ROOT/ksh/.dk_init
# unalter DK_NODE /usr/global/tools/dotkit
# alter DK_NODE /usr/global/tools/dotkit # prepends LC / DEG .dk files to default set of .dk files
reuse -q lcinit
fi
fi
fi
I can directly load . $DK_ROOT/ksh/.dk_init
(which doesn't make the installation work either yet), but I want to know how to pass all the other environment variables that might be required to load DotKit within the shell of the build. How can I do this?
That's what the prologue is for, but the idea is to decouple from the users environment as much as possible. As @johannesring said, you can source the .*rc files in the prologue or you can drop into the debug shell and set them as needed to debug the build and then add just the minimal fixes to the prologue section.
HOME
does not appear in the debug shell even defining it in the PROLOGUE
. It seems to me that the variables defined in PROLOGUE
are exported inside the build.sh
, but I need to have HOME
before I run build.sh
That's right. The user environment variables are unset in the build shell. You can set HOME yourself in the debugging shell.
Yes, I can do that, but then whenever I run /bin/bash _hashdist/build.sh
inside the build shell, everything is cleared out and I get the same error:
This command is part of Dotkit, which you may access
after initializing via the following command:
For csh/tcsh shells:
source /usr/local/tools/dotkit/init.csh
For sh/ksh/bash shells:
. /usr/local/tools/dotkit/init.sh
For zsh shells:
. /usr/local/tools/dotkit/init.zsh
However if I examine the content of _hashdist/build.sh
and run each line in the build shell, I can compile and install the package with success (after having defined HOME and sourced /usr/local/tools/dotkit/init.sh in the build shell) Is there any way to pass the build shell environment into /bin/bash _hashdist/build.sh
?
In the debug shell you should be able to do HOME=xyz MYVAR=abc ... /bin/bash _hashdist/build.sh
or you can just edit _hashdist/build.sh directly. Once you have the set of export VAR=
statements require to build properly, put those in the PROLOGUE step. Then exit 1
from the debug shell and try to build your stack again.
I think the best is to edit _hashdist/build.sh
directly because . /usr/local/tools/dotkit/init.sh
is considerably long. I am going to look into prepend the lines
export HOME=...
. /usr/local/tools/dotkit/init.sh
at the beginning of each _hashdist/build.sh. Since this script is generated for each package (isn't it?) Could you pinpoint to where in hashdist this script is generated? Thanks.
Anything you put in the prologue is going to get written to _hashdist/build.sh for every package.
Ok I wasn't putting the commands in PROLOGUE in the right order. I had included . /usr/local/tools/dotkit/init.sh
as first command weeks ago and then removed it, but back then I didn't know I needed export HOME=
before anything else. Now I know this and it works if I pass the --debug
flag and execute _hashdist/build.sh
and exit 0
for each package. Without this flag though, I get the error:
[blas|ERROR] Command '[u'/bin/bash', '_hashdist/build.sh']' returned non-zero exit status 4
[blas|ERROR] command failed (code=4); raising
Can I install the packages in debug mode without any other consequence? Why I get this error without debug? Thanks a lot for your help!
Can you post your profile yaml file? I'm not sure why your _hashdist/build.sh is failing outside of the debug shell.
# This profile file controls your <#> (HashDist) build environment.
# In the future, we'll provide better incorporation of
# automatic environment detection. For now, have a look
# at the YAML files in the top-level directory and choose
# the most *specific* file that matches your environment.
extends:
- file: linux.yaml
parameters:
PROLOGUE: |
export HOME=/g/g92/miguel; . /usr/local/tools/dotkit/init.sh; use openmpi-intel-1.8.4; export CC=mpicc; export CXX=mpic++; export FC=gfortran; export F77=mpif77; export F90=mpif90; export CPP=cpp;
HOST_MPICC: /usr/local/tools/openmpi-intel-1.8.4/bin/mpicc
HOST_MPICXX: /usr/local/tools/openmpi-intel-1.8.4/bin/mpic++
HOST_MPIF77: /usr/local/tools/openmpi-intel-1.8.4/bin/mpif77
HOST_MPIF90: /usr/local/tools/openmpi-intel-1.8.4/bin/mpif90
HOST_MPIEXEC: /usr/local/tools/openmpi-intel-1.8.4/bin/mpiexec
HOST_CMAKE: /usr/local/bin/cmake
HOST_PETSC_DIR: /g/g92/miguel/petsc-3.6.2/
HOST_PETSC_ARCH: miguel-opt
HOST_BOOST: /usr/local/tools/boost-mpi-1.55.0/lib
LD_LIBRARY_PATH: /usr/local/tools/ic-14.0.174/lib/:/usr/local/tools/openmpi-intel-1.8.4/lib/openmpi:/usr/local/tools/openmpi-intel-1.8.4/lib:/usr/local/tools/boost-mpi-1.55.0/lib:/usr/local/tools/vtk-6.1.0/lib/python2.6/site-packages/vtk:/usr/local/tools/vtk-6.1.0/lib:/usr/local/tools/qt-4.8.3/lib:/usr/local/tools/boost-nompi-1.49.0/lib:/usr/local/tools/sqlcipher-2.0.3-0/lib:/usr/local/tools/boost-nompi-1.55.0/lib
PATH: /g/g92/miguel/pythonpackages/bin/:/g/g92/miguel/shawncplus-Vim-toCterm-0f47db8/:/g/g92/miguel/pythonpackages/bin/:/g/g92/miguel/Xvfb/bin/:/g/g92/miguel/jdk1.7.0_79/bin/:/usr/local/tools/openmpi-intel-1.8.4/bin:/usr/local/tools/python-2.7.7/bin:/usr/global/tools/clang/chaos_5_x86_64_ib/clang-3.7.0/bin:/usr/local/tools/boost-mpi-1.55.0/bin:/usr/local/tools/vtk-6.1.0/bin:/usr/local/tools/qt-4.8.3/bin:/usr/local/tools/imgtrack-1.0/bin:/usr/local/tools/sqlcipher-2.0.3-0/bin:/usr/local/tools/boost-nompi-1.55.0/bin:/usr/local/tools/ld-auto-rpath/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/global/tools/totalview/m/ansel/default/bin:/collab/usr/global/tools/git/chaos_5_x86_64_ib/git-2.0.0/bin
packages:
launcher:
cmake:
use: host-cmake
mpi:
use: host-mpi
blas:
use: openblas
hdf5:
petsc:
use: host-petsc
version: '3.6.2'
h5py:
pyvtk:
matplotlib:
swig:
scipy:
cbcblock:
package_dirs:
- /g/g92/miguel/petsc-3.6.2
- pkgs
- base
Nothing is jumping out at me here. I suspect that when you run /bin/bash _hashdist/build.sh in the debug shell it is returning error code 4, so even though it "works" in the debug shell you may need to do some debugging to see why the return code isn't 0.
You're correct. I typed echo $?
after /bin/bash _hashdist/build.sh
and I got 4
. I found out that the problem is in one of the commands in . /usr/local/tools/dotkit/init.sh
. Before, I was running . /usr/local/tools/dotkit/init.sh
outside of the _hashdist/build.sh
and had no warnings. Inside _hashdist/build.sh
, the first line is set -e
and when /usr/local/tools/dotkit/init.sh
is sourced , it aborts the build because of one of the commands within the init.sh
. Specifically, reuse -q lcinit
which is part of DotKit. Even running this command after set -e
in my regular shell disconnects me from the cluster. The strange thing is that running reuse -q lcinit
and echo $?
returns 0. I will ask the admins what's going on. Is there anyway to invalidate the set -e
in the PROLOGUE?
You may be able to run the dotkit/init.sh inside some logic that will trap the error. If you can get a response from the admins on why it's returning an error that would probably be easier and better.
I ended up adding set +e
in my PROLOGUE
before initializing dotkit and calling the commands that throw the error and then set -e
at the end. This seems to work. Thanks for the help.
Hello I want to install several packages with hashdist in a cluster that uses a package manager called dotkit, hence the error below that I get when hashdist tries to install the first package:
As you can see, hashdist wants to run the script build.sh, which is as follows:
set -e export HDIST_IN_BUILD=yes . /usr/local/tools/dotkit/init.sh; use openmpi-intel-1.8.4; export CC=mpicc; export CXX=mpic++; export FC=gfortran; export F77=mpif77; export F90=mpif90; export CPP=cpp; ( export CPPFLAGS="" export LDFLAGS="" ./configure --prefix="${ARTIFACT}" ) make -j ${HASHDIST_CPU_COUNT} make install rm -f ${ARTIFACT}/lib/*.la
but it’s unable to run the first line because Dotkit has not been initialized. I try to pass . /usr/local/tools/dotkit/init.sh in the
PROLOGUE
, but as you can see it’s useless. Any idea of something I could try?Thanks Miguel