hashdist / hashstack

Collection of software profiles for HashDist
https://hashdist.github.io/
51 stars 60 forks source link

PETSc fails on cluster - Could not find a functional BLAS #594

Open johannesring opened 9 years ago

johannesring commented 9 years ago

When building petsc on our local cluster Abel, it fails to find blas:

[petsc] *******************************************************************************
[petsc]          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
[petsc] -------------------------------------------------------------------------------
[petsc] Could not find a functional BLAS. Run with --with-blas-lib=<lib> to indicate the library containing BLAS.
[petsc]  Or --download-fblaslapack=1 to have one automatically downloaded and installed
[petsc] *******************************************************************************
[petsc] 
[petsc|ERROR] Command '[u'/bin/bash', '_hashdist/build.sh']' returned non-zero exit status 1
[petsc|ERROR] command failed (code=1); raising

See configure.log for details.

I am using the following profile:

extends:
- file: linux.yaml

parameters:
  HOST_MPICC: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpicc
  HOST_MPICXX: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpic++
  HOST_MPIF77: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpif77
  HOST_MPIF90: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpif90
  HOST_MPIEXEC: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpiexec
  PROLOGUE: |
    source /usr/share/Modules/init/bash; export MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles:/cluster/etc/modulefiles; echo "loading modules"; module load gcc/4.9.2; module load openmpi.gnu/1.8.4; echo "setting default compilers"; export CC=gcc; export CXX=g++; export FC=gfortran; export F77=gfortran; export F90=gfortran; export CPP=cpp;

packages:
  launcher:
  python:
    link: shared
  mpi:
    use: host-mpi
  blas:
    use: lapack
  lapack:
    use: lapack
  petsc:
    build_with: |
      parmetis, scotch, suitesparse
    download: |
      superlu, superlu_dist, hypre, scalapack, blacs, mumps, ml
    coptflags: -O3 -march=native -mtune=native
    link: shared
    debug: false

The petsc build script looks like this:

$ hit show test.petsc.yaml script petsc
set -e
export HDIST_IN_BUILD=yes
source /usr/share/Modules/init/bash; export MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles:/cluster/etc/modulefiles; echo "loading modules"; module load gcc/4.9.2; module load openmpi.gnu/1.8.4; echo "setting default compilers"; export CC=gcc; export CXX=g++; export FC=gfortran; export F77=gfortran; export F90=gfortran; export CPP=cpp;
mkdir ${PWD}/_tmp && TMPDIR=${PWD}/_tmp \
  ./configure --prefix="${ARTIFACT}" \
  COPTFLAGS=-O3 -march=native -mtune=native \
  --with-shared-libraries=1 \
  --with-debugging=0 \
  --with-blas-dir=$BLAS_DIR \
  --with-lapack-dir=$LAPACK_DIR \
  --with-metis-dir=$PARMETIS_DIR \
  --with-parmetis-dir=$PARMETIS_DIR \
  --with-scotch-dir=${SCOTCH_DIR} \
  --with-ptscotch-dir=${SCOTCH_DIR} \
  --with-suitesparse=1 \
  --with-suitesparse-include=${SUITESPARSE_DIR}/include/suitesparse \
  --with-suitesparse-lib=[${SUITESPARSE_DIR}/lib/libumfpack.a,libklu.a,libcholmod.a,libbtf.a,libccolamd.a,libcolamd.a,libcamd.a,libamd.a,libsuitesparseconfig.a] \
  --with-mpi-compilers \
  CC=$MPICC \
  CXX=$MPICXX \
  F77=$MPIF77 \
  F90=$MPIF90 \
  FC=$MPIF90 \
  --with-patchelf-dir=$PATCHELF_DIR \
  --with-python-dir=$PYTHON_DIR \
  --download-superlu=1 \
  --download-superlu_dist=1 \
  --download-hypre=1 \
  --download-scalapack=1 \
  --download-blacs=1 \
  --download-mumps=1 \
  --download-ml=1
make
make install

I noticed that petsc.yaml doesn't have a build dependency on lapack. I tried adding that but it didn't make any difference.

ahmadia commented 9 years ago

Where does your BLAS live? Can you build PETSc outside of hashstack?

On Monday, December 15, 2014, Johannes Ring notifications@github.com wrote:

When building petsc on our local cluster Abel http://www.uio.no/english/services/it/research/hpc/abel/, it fails to find blas:

[petsc] [petsc] UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): [petsc] ------------------------------------------------------------------------------- [petsc] Could not find a functional BLAS. Run with --with-blas-lib= to indicate the library containing BLAS. [petsc] Or --download-fblaslapack=1 to have one automatically downloaded and installed [petsc] [petsc] [petsc|ERROR] Command '[u'/bin/bash', '_hashdist/build.sh']' returned non-zero exit status 1 [petsc|ERROR] command failed (code=1); raising

See configure.log https://gist.github.com/johannesring/f92dc82b79542bd78512 for details.

I am using the following profile:

extends:

  • file: linux.yaml

parameters: HOST_MPICC: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpicc HOST_MPICXX: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpic++ HOST_MPIF77: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpif77 HOST_MPIF90: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpif90 HOST_MPIEXEC: /cluster/software/VERSIONS/openmpi.gnu-1.8.4/bin/mpiexec PROLOGUE: | source /usr/share/Modules/init/bash; export MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles:/cluster/etc/modulefiles; echo "loading modules"; module load gcc/4.9.2; module load openmpi.gnu/1.8.4; echo "setting default compilers"; export CC=gcc; export CXX=g++; export FC=gfortran; export F77=gfortran; export F90=gfortran; export CPP=cpp;

packages: launcher: python: link: shared mpi: use: host-mpi blas: use: lapack lapack: use: lapack petsc: build_with: | parmetis, scotch, suitesparse download: | superlu, superlu_dist, hypre, scalapack, blacs, mumps, ml coptflags: -O3 -march=native -mtune=native link: shared debug: false

The petsc build script looks like this:

$ hit show test.petsc.yaml script petsc set -e export HDIST_IN_BUILD=yes source /usr/share/Modules/init/bash; export MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles:/cluster/etc/modulefiles; echo "loading modules"; module load gcc/4.9.2; module load openmpi.gnu/1.8.4; echo "setting default compilers"; export CC=gcc; export CXX=g++; export FC=gfortran; export F77=gfortran; export F90=gfortran; export CPP=cpp; mkdir ${PWD}/_tmp && TMPDIR=${PWD}/_tmp \ ./configure --prefix="${ARTIFACT}" \ COPTFLAGS=-O3 -march=native -mtune=native \ --with-shared-libraries=1 \ --with-debugging=0 \ --with-blas-dir=$BLAS_DIR \ --with-lapack-dir=$LAPACK_DIR \ --with-metis-dir=$PARMETIS_DIR \ --with-parmetis-dir=$PARMETIS_DIR \ --with-scotch-dir=${SCOTCH_DIR} \ --with-ptscotch-dir=${SCOTCH_DIR} \ --with-suitesparse=1 \ --with-suitesparse-include=${SUITESPARSE_DIR}/include/suitesparse \ --with-suitesparse-lib=[${SUITESPARSE_DIR}/lib/libumfpack.a,libklu.a,libcholmod.a,libbtf.a,libccolamd.a,libcolamd.a,libcamd.a,libamd.a,libsuitesparseconfig.a] \ --with-mpi-compilers \ CC=$MPICC \ CXX=$MPICXX \ F77=$MPIF77 \ F90=$MPIF90 \ FC=$MPIF90 \ --with-patchelf-dir=$PATCHELF_DIR \ --with-python-dir=$PYTHON_DIR \ --download-superlu=1 \ --download-superlu_dist=1 \ --download-hypre=1 \ --download-scalapack=1 \ --download-blacs=1 \ --download-mumps=1 \ --download-ml=1 make make install

I noticed that petsc.yaml doesn't have a build dependency on lapack. I tried adding that but it didn't make any difference.

— Reply to this email directly or view it on GitHub https://github.com/hashdist/hashstack/issues/594.

johannesring commented 9 years ago

BLAS is built by HashDist using the lapack package (use: lapack).

ahmadia commented 9 years ago

oh.

  blas:
    use: lapack

Yeah that shouldn't even work. LAPACK and BLAS are two different libraries. LAPACK relies on a BLAS installation, and sometimes they are bundled together.

certik commented 9 years ago

Our lapack package installs both blas and lapack.

ahmadia commented 9 years ago

I don't see a BLAS installed by that package.

johannesring commented 9 years ago

Yes, the lapack package installs libblas.so. This patch fixed the problem for me:

diff --git a/pkgs/petsc/petsc.py b/pkgs/petsc/petsc.py
index 2c7ea50..9c1fda4 100644
--- a/pkgs/petsc/petsc.py
+++ b/pkgs/petsc/petsc.py
@@ -80,8 +80,8 @@ def configure(ctx, stage_args):
         # Special case, no meaningful BLAS/LAPACK directories when using Accelerate

         if ctx.parameters['platform'] != 'Darwin':
-            conf_lines.append('--with-blas-dir=$BLAS_DIR')
-            conf_lines.append('--with-lapack-dir=$LAPACK_DIR')
+            conf_lines.append('--with-blas-lib=$BLAS_DIR/lib/libblas.so')
+            conf_lines.append('--with-lapack-lib=$LAPACK_DIR/lib/liblapack.so')

     # Special case, ParMETIS also provides METIS 
     if 'PARMETIS' in ctx.dependency_dir_vars:
diff --git a/pkgs/petsc/petsc.yaml b/pkgs/petsc/petsc.yaml
index 5ebeb0f..f4a47a8 100644
--- a/pkgs/petsc/petsc.yaml
+++ b/pkgs/petsc/petsc.yaml
@@ -1,6 +1,6 @@
 extends: [autotools_package]
 dependencies:
-  build: [blas, mpi, python, {{build_with}}]
+  build: [blas, lapack, mpi, python, {{build_with}}]

 sources:
 - key: tar.gz:ygj3gebgevuuwg5evyrgenkhf3ry5b2d
ahmadia commented 9 years ago

Okay. Can we rename the package lapack to lapack-blas or something to indicate that it's providing both packages?

certik commented 9 years ago

Yes, it provides both:

certik@redhawk:/local/certik/bld/lapack/odlwmumv6p3p$ ll lib/
total 5920
dr-xr-xr-x 4 certik certik    4096 Dec 11 15:09 ./
dr-xr-xr-x 3 certik certik    4096 Dec 11 15:09 ../
dr-xr-xr-x 3 certik certik    4096 Dec 11 15:09 cmake/
-r-xr-xr-x 1 certik certik  369771 Dec 11 15:08 libblas.so*
-r-xr-xr-x 1 certik certik 5652675 Dec 11 15:09 liblapack.so*

Well, given the fact that upstream just calls it Lapack, and it's the reference implementation, I would just keep it as it is.

ahmadia commented 9 years ago

I'm -10 on having a package called lapack that also provides a blas. There are a number of situations where this will confuse and anger our users. I'm currently confused and angry on their behalf :)

Also, "upstream" pretty clearly refers to BLAS and LAPACK as separate packages, if you read their documentation. I have no idea how you're building a separate BLAS library, but I suspect that it has something to do with the experimental CMake build that you're using. Perhaps it's downloading BLAS?

ahmadia commented 9 years ago

See: http://www.netlib.org/lapack/faq.html#_why_aren_8217_t_blas_routines_included_when_i_download_an_lapack_routine

For example. I'm 90% sure that the CMake installer is just pulling a BLAS down if it can't find one on the system. I'm not opposed to a package that installs both LAPACK and BLAS. I'm opposed to calling it LAPACK.

certik commented 9 years ago

Just download the http://www.netlib.org/lapack/lapack-3.4.2.tgz file and look into BLAS/SRC to find the sources. We can set USE_OPTIMIZED_BLAS to yes in the cmake, then it will use our own optimized blas (of course, we need to provide it then).

ahmadia commented 9 years ago

Okay, I'm coming around a bit to this. Can we at least provide a parameter that makes it clear that this lapack is providing its own blas? I think it would be much clearer to read this if it looked like:

  blas:
    use: lapack
  lapack:
    build_blas: true
certik commented 9 years ago

+1 to that.