hashdist / hashstack

Collection of software profiles for HashDist
https://hashdist.github.io/
51 stars 60 forks source link

Environment variables vs file system linking for host packages of high performance numerical linear algebra libraries #283

Open ahmadia opened 10 years ago

ahmadia commented 10 years ago

(cc @jedbrown / @certik / @poulson ) This discussion is going to be relevant to the Julia package as well since they rely on a high performance BLAS as well.

This issue is related to https://github.com/hashdist/hashstack/issues/220 and https://github.com/hashdist/hashstack/pull/218

I took a look through installing Elemental using hashstack with @poulson last night. The chief dependencies for Elemental are basically ScaLAPACK and its dependencies (MPI, LAPACK, BLAS).

Since Elemental is a high-performance linear algebra library, it is important that we offer flexibility in the choice of underlying BLAS/LAPACK implementations. In particular, we need to pick up "host" LAPACK/BLAS installations and be able to work with them. This hearkens back to the issue I ran into with Jed while trying to work on PETSc installations within hashstack. For an experienced user like Jack or Jed, "host" might mean somewhere in their own home directory, not something installed by the system. Some of our host packages support this, some don't (they only provide simple link flags that cannot be customized by the user's profile). We need to further improve the host packages for BLAS and LAPACK to use file system links or environment variables to better connect the host packages with the rest of the profile.

This is deeply related to https://github.com/hashdist/hashstack/pull/218, where we considered linking in the host python package to the hashdist host package build directory instead of communicating its location using environment variables. In most situations environment variables are more flexible, but the file system links provide more consistency. In the situation of numerical linear algebra libraries, we may have no choice but to use environment variables, since on supercomputing environments and on OS X, the BLAS and LAPACK libraries may be specified only with compilation/link flags, and don't have a standard file system location that can be linked in.

In situations where these environment variables are needed, I propose we use the autoconf convention:

PACKAGE_CPPFLAGS
PACKAGE_LIBS
PACKAGE_LDFLAGS

So a host lapack package may specify something like:

PACKAGE_LIBS=-llapack
PACKAGE_LDFLAGS=-L/home/aron/my_lapack/lib
certik commented 10 years ago

I agree that <#> needs to be able to use either user provided or system wide high performance LAPACK implementation. In addition to this, we provide reasonable open source implementations in <#> itself, like Lapack and OpenBlas, so <#> also needs to be able to use those as well.

I think the usage of such environment variables is fine. However, I think we need to have an official document, that describes what variables the packages in Hashstack should be using. It's a growing list, it starts with $PYTHON, then all these LAPACK variables, and there is probably more. We need all packages to consistently be using it.

As to which LAPACK package the <#> packages should use by default (for people who just want to build the stack and don't care so much), it should be a package that we provide, and that works and builds easily on all platforms. I personally have no doubt it should just be the reference Lapack Fortran implementation (the lapack <#> package), because the other option is Atlas which takes forever to build, or OpenBlas (I know @dagss recommended to use it by default), which however has build issues (e.g. #204, #243). I never had any problems with lapack, and when compiled with all optimizations on (the default in <#>), the speed is not bad at all for development purposes. For production run, the user can either switch to OpenBlas (if it builds) or specify a system wide version from the cluster.

JaroslavHron commented 10 years ago

Hi, I am about to test hashdist/hashstack - so far I have no experience with it.

Concerning blas/lapack selection, may be using this run-time blas selection could be used? see. http://www.mpi-magdeburg.mpg.de/projects/flexiblas/

certik commented 10 years ago

@j-h that should work, we just need to package it.