Update pychm_singuarity build to add MKL library

coleslaw481 commented 7 years ago

MKL has a Yum-compatible repo, I just tried it out and it works on my machine at home where I have root access: https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo

After getting the repo up and installing intel-mkl-64-bit-2017 I had to change the environmental variables:

export LD_LIBRARY_PATH=/opt/intel/:$LD_LIBRARY_PATH

and create the file ~/.numpy-site.cfg with the contents:

[mkl] library_dirs = /opt/intel/mkl/lib/intel64 include_dirs = /opt/intel/mkl/include mkl_libs = mkl_rt lapack_libs = pip install --no-binary :all: numpy scipy

makes numpy and scipy use the MKL. The file ~/.numpy-site.cfg can then be deleted but you must always set the LD_LIBRARY_PATH or import numpy will fail.

coleslaw481 commented 7 years ago

notes from another user installing scipy/numpy with mkl http://www.elliottforney.com/blog/npspmkl/

coderforlife commented 7 years ago

So here are some better instructions which work without need root or creating random files.

First you have to download the MKL library from Intel. I had to register at https://registrationcenter.intel.com/en/forms/?productid=2558&licensetype=2 to be able to download the MKL installer for Linux (interestingly enough the yum repo doesn't require registration to use but the installer does...). It's also a very large download (885 MB).

After downloading:

tar -xzf l_mkl_2017.2.174.tgz
cd l_mkl_2017.2.174
./install.sh --user-mode

install.sh normally runs interactively requiring user input, however it can be run silently using --silent. That needs a file with settings in it to be generated first, either by hand starting from the silent.cfg file included or with --duplicate, see https://software.intel.com/en-us/articles/intel-composer-xe-2015-silent-installation-guide for more information (it is a guide for a related installer, but a lot of it should apply here as well). Other options that may be useful are --SHARED_INSTALL or --nonrpm-db-dir. One important option I had to change in the interactive installer was the installation directory and tell it not to install the 32-bit components (cutting the install size down significantly). It was 1.5 GB once installed.

Several environmental variables will be needed:

export MKLROOT=/opt/intel/mkl # fill this in with the path install_dir/mkl

# Not sure if these variants are necessary
export MKL_ROOT=$MKLROOT
export MKLHOME=$MKLROOT

# Needed while running programs that use numpy/scipy and during compilation
export LD_LIBRARY_PATH=$MKLROOT/lib/intel64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Possibly needed during compilation
export LIBRARY_PATH=$MKLROOT/lib/intel64${LIBRARY_PATH:+:${LIBRARY_PATH}}
export FPATH=$MKLROOT/include${FPATH:+:${FPATH}}
export CPATH=$MKLROOT/include${CPATH:+:${CPATH}}
export INCLUDE=$MKLROOT/include${INCLUDE:+:${INCLUDE}}

(the ${x:+:${x}} parts are usable in bash only and make it so if there is already a variable defined with the name x it is prepended with our new path and a : otherwise the variable is just set to the new path)

Then numpy/scipy can be installed as follows:

pip install --no-binary :all: numpy scipy

(if it is already installed this will not re-install it, may need to do a pip uninstall numpy scipy first)

To make sure that numpy and scipy are using MKL instead of OpenBLAS you can do the following:

python -c 'import numpy, scipy; numpy.__config__.show(); scipy.__config__.show()'

And the output should be something like:

lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/intel/mkl/include']
blas_opt_info:
    <same as above>
lapack_mkl_info:
    <same as above>
blas_mkl_info:
    <same as above>
lapack_opt_info:
    <same as above>
blas_opt_info:
    <same as above>
lapack_mkl_info:
    <same as above>
blas_mkl_info:
    <same as above>

If the lapack_mkl_info and blas_mkl_info are missing or say NOT AVAILABLE or if the other entries don't reference the mkl_rt library then it is likely not using MKL.

coderforlife commented 7 years ago

@coleslaw481 Nice little guide but I think it is a bit out of date and the above doesn't require creating some random file anymore and allows everything to be done with alternative paths (given pip is running in a virtualenv).

coderforlife commented 7 years ago

Ha - out of date - it was posted 7 days ago. But that silly .numpy-site.cfg file is not necessary and pip is definitely the way to go with installing instead of using setup.py manually.

One possible improvement in theirs is the following definitions before compiling numpy/scipy:

export CFLAGS='-fopenmp -O2 -march=core2 -ftree-vectorize'
export LDFLAGS='-lm -lpthread -lgomp'

However some of these seem like they would be unnecessary:

I don't think any part of numpy/scipy is parallelized except through BLAS/LAPACK so including OpenMP is not going to help
The m and pthread libraries are already included I believe, and gomp is just for OpenMP
Seems to compile with -O2 or -O3 by default (on Blackwidow used -O2 and on Comet -O3)
The -march and -ftree-vectorize arguments may help actually

coderforlife commented 7 years ago

So -march=core2 only adds SSE3 (the default x86-64 already includes SSE and SSE2). The -ftree-vectorize is included with -O3 so on Comet that option is included.

coderforlife commented 7 years ago

I have confirmed that if you do the following before doing the pip install they will affect the compilation:

export CFLAGS='-march=native -mtune=native -O3 -fopenmp'
export LDFLAGS='-lm -pthread -lgomp -shared'

The -shared is necessary since otherwise scipy fails to build if you define a custom LDFLAGS without it. The -march=native and -mtune=native are highly aggressive optimizations that make it work best on the current CPU type (mtune) and only on the current CPU type and later (march). This would work for a cluster where all the nodes have the same type of CPU. However in a more mixed environment this will be too aggressive and mtune should be dropped and march should be set to the oldest CPU you want to run on.

Also I included the OpenMP stuff since it doesn't hurt.

slash-segmentation / CHM

Update pychm_singuarity build to add MKL library #26