rocker-org / rocker-versioned

Run current & prior versions of R using docker
https://hub.docker.com/r/rocker/r-ver
GNU General Public License v2.0
297 stars 169 forks source link

openblas version performance (0.2.19 vs 0.3.5) #123

Closed rundel closed 5 years ago

rundel commented 5 years ago

I have been using rocker images as the basis for singularity images which are being used on a HPC cluster. Based on my recent exerpience the version of openblas in stretch (0.2.19) has serious performance regressions for certain cpu chipsets.

For example running a simple model in spBayes took more 600 seconds when using 4 cores on a node with a xeon scalable gold chip, other nodes with older hardware were able to finish the same job in ~60 seconds. To make matters worse running the same model on a 2 core laptop with openblas 0.3.5 finished in ~15 seconds. Switching to images built using the updated version results in the expected performance (~10-20 sec runtimes) across heterogeneous nodes.

These regressions are bad enough that it makes rocker unusable for me at the moment. Are you open to the possibility of transitioning r-ver to using a up-to-date compiled version instead of the stretch's packaged version? I'm happy to take a stab a putting together a pull request if so.

cboettig commented 5 years ago

Hi Colin, thanks for the report, that's quite severe!

Have you looked at how performance compares if we just drop openblas libs and use R's defaults?

Presumably when buster comes out later this year that will be fixed; I'm guessing buster will be shipping with a much newer version but haven't checked.

Meanwhile, yes, it would make sense to do something for at least the current images. I'm reluctant to backport things in the older (3.5.1 back through 3.4.0) stretch-based images since the emphasis on those is really on stability. Rather than patching the r-ver directly, can we just upgrade libopenblas-dev in a downstream image, or do we need to rebuild R for this to work?

Thanks a lot for the note!

rundel commented 5 years ago

The base R blas / lapack also has abysmal performance unfortunately - just using single threaded openblas offers a massive improvement in performance over it.

It does look like it is a short term problem as buster is using openblas 0.3.5.

In my mind r-ver makes sense as the place to take care of this since that is where openblas is installed for the versioned stack. I don't see any reason that it would need to be back ported.

cboettig commented 5 years ago

Can you expand a bit on how you would propose to add this? (Feel free to open in a PR if that makes it easier). I.e. you could introduce this into r-ver by apt-pinning debian:testing, but this can have potentially dramatic consequences down the stack if it pulls in newer versions of the compilers etc. Someone building on r-ver:3.5.2 or one of it's derivative images today has the expectation that they should be able to rebuild today's environment a year from now using the same image tag, and that would clearly break that model.

presumably one could compile 0.3.5 openblas itself from source, but that may be more involved.

having this as an add-on would make it more explicitly an opt-in option, instead of a silent (allbeit no doubt largely positive) change that impacts users down stream.

rundel commented 5 years ago

I just created a pull request (#124) with a potential solution that involves compiling openblas before R and then linking to libopenblas. Current config options preclude using update-alternatives to swap between blas installs (I believe).

Everything builds correctly for me and seems to be working. If you want to try out the built image it is available on docker hub as rundel/r-ver:test

eddelbuettel commented 5 years ago

Can we / should we do a little triage to see where/how any possible breakage occurred, and how/if Debian could be unaware? Lots of people should be using the same binaries so I am a little surprise by a regression in performance.

rundel commented 5 years ago

My original reprex was overly convoluted and I've been working my way through things again to try to track down the root cause of what I was seeing. It turns out that it is more complicated than I had originally thought and appears to be a weird interaction between openblas, slurm, and high core count nodes.

Paring things back to just use rocker/r-ver:3.5.2 and rundel/r-ver:test (plus spBayes) I was able to see the regression on our node. However, if I also included RhpcBLASctl and used it to restrict the number of blas threads to match the number of cores per task in slurm (4 in this case) then both images had equivalent performance.

It seems like on the older openblas versions get in to trouble when there are too many cores around and the number of openblas threads is not restricted but some additional higher level cpu constraint is imposed (e.g. slurm's cpus per task).

A little bit of a perfect storm in this case that derailed my week.

With all of that said, the compiled version of openblas doesn't seem like a reasonable fix and I've closed the PR. However, including something like RhpcBLASctl in r-ver seems useful to me and I would be in favor of going even farther to set a default number of threads for blas (maybe something like min(parallel::detectCores(), 8) in Rprofile.site)

prdm0 commented 5 years ago

Consider compiling R and OpenBLAS and linking both. I did a little tutorial. I did not notice large differences in computational performance using the different versions of OpenBLAS.

See: https://github.com/prdm0/compiling_r/blob/master/README.md

Instructions for compiling R, OpenBLAS and linking R with OpenBLAS (GNU/Linux)

DEPENDENCES: make, cmake, gcc, gcc-fortran and tk.

Important: I'll be at all times assuming that the project Julia has been cloned into the directory ~/Downloads. Also, I will consider the /opt directory as the installation directory for the OpenBLAS library and of the Julia language. You can choose a directory of your choice.

Compiling OpenBLAS

Initially download the R and OpenBLAS (Open Optimized BLAS Library) source codes in OpenBLAS. In the file directory, perform the following steps.

tar -zxvf OpenBLAS*
cd OpenBLAs*
make -j $(nproc)
sudo make install
export LD_LIBRARY_PATH=/opt/OpenBLAS/lib/

or

git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS*
git checkout v0.3.5
make -j $(nproc)
sudo make install
export LD_LIBRARY_PATH=/opt/OpenBLAS/lib/

Note: This will make the compilation run faster using all the features of your CPU. To know the number of cores, do: nproc. The default installation directory is /opt/OpenBLAS.

Compiling Armadillo C++ with OpenBLAS

For those who use C++ codes in R using the library Rcpp, configure the Armadillo with the library OpenBLAS be something fruitful.

tar -xvf armadillo*
cd armadillo*
./configure -DCMAKE_PREFIX_PATH=/opt/OpenBLAS/lib/
cmake . -DCMAKE_PREFIX_PATH=/opt/OpenBLAS/lib/
make -j $(nproc)
sudo make install

Note: Further details regarding the compilation of the library Armadillo can be found at https://gitlab.com/conradsnicta/armadillo-code.

Compiling R with OpenBLAS

After compiling OpenBLAS, download the R code. It is not necessary to compile R to make use of OpenBLAS, but compiling the language may bring some benefits that may be insignificant depending on what is being done in R. That way, download the source code of the language R.

Note: In my operating system, Arch Linux, OpenBLAS was installed in the /opt directory. Search for the OpenBLAS installation directory in your GNU/Linux distribution.

In the directory where the R was downloaded, do the following:

tar -zxvf R*
cd R-* && ./configure --enable-R-shlib --enable-threads=posix --with-blas="-lopenblas -L/opt/OpenBLAS/lib -I/opt/OpenBLAS/include -m64 -lpthread -lm"
make -j $(nproc)
sudo make install

Most likely the OpenBLAS library will be bound to R. To check, run in the R the sessionInfo() code. Something like the output below should appear:

Matrix products: default
BLAS/LAPACK: /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.5.so

If linking does not occur, follow the steps outlined in the code below.

We need to link the R with the file libopenblas_*, created in the process of compiling the library OpenBLAS. In my case, the file is libopenblas_haswellp-r0.3.5.so. Look for this in /opt/OpenBLAS/lib or in the directory where OpenBLAS was installed on your GNU/Linux system. Also look for the libRblas.so file directory found in the R language installation directory. In Arch, this directory is /usr/local/lib64/R/lib.

cd /usr/local/lib64/R/lib
mv libRblas.so libRblas.so.keep
ln -s /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.5.so libRblas.so

Start a section of language R and do sessionInfo(). You should note something like:

Matrix products: default
BLAS/LAPACK: /opt/OpenBLAS/lib/libopenblas_haswellp-r0.3.5.so

To make use of multithreaded processing, do export OPENBLAS_NUM_THREADS=1 before starting a R section.

export OPENBLAS_NUM_THREADS=1
export GOTO_NUM_THREADS=1
export OMP_NUM_THREADS=1

NOTE: For intel processors,sudo cpupower frequency-set -g performance, can boost performance. Read more at https://wiki.archlinux.org/index.php/CPU_frequency_scaling.

eddelbuettel commented 5 years ago

Well -- I probably spent well over a decade fighting the myth that one would need to "recompile" R to use a different BLAS/LAPACK. In short, one does not. Set up R for dynamic linking, then simply swap them in and out.

Old(er) repo, CRAN package and vignette:

Newer script and writeup for MKL swap in/out

Of course, you can always compile something locally and use -march=native on it. But doing that is a little orthogonal to distributing binaries as they become non-portable and is why eg Debian or Ubuntu do not do it. If you believe strongly in a Gentoo-alike approach you are more than welcome to pursue it. We most likely won't.