rocker-org / rocker

R configurations for Docker
https://rocker-project.org
GNU General Public License v2.0
1.45k stars 273 forks source link

New R-base 4.0.2 is significantly slower then older versions #412

Closed Someone894 closed 3 years ago

Someone894 commented 4 years ago

Currently we are having some run time related trouble with the new R images. We are using 4 different Docker-images containing an R environment at the moment. On these images we run a multi-core application with foreach and dopar.

  1. R-Base version 3.6.3 1.1 This one compiles the R-interpreter while container creation 1.2 Since I can't find the original Dockerfile in your repo-history anymore I show you the version I copied (and changed to ppc64le) over from your Repo some time ago. 1.3 Run time wise this is our gold-standard, where we compare everything to. But we can't use this image any longer, since some dependencies (e.g XML) are no longer supported. 1.4 While execution this one uses nearly 100% of the CPU in user mode.

  2. R-Base version 4.0.2 2.1 This one is just downloading a precompiled R via apt-get install 2.2 Here is the Dockerfile 2.3 While execution this one uses about 75% of the CPU in user mode and the rest in system mode. 2.4 Therefore the execution time is about 25% slower then the image from point 1.

  3. R-ver version 4.0.2 from dockerhub 3.1 This one compiles the R-interpreter while container creation 3.2 Here is the Dockerfile and the install script 3.3 While execution this one uses nearly 100% of the CPU in user mode. 3.4 But its performance is 6 times as slow, as the image from point 1.

  4. R-ver version 4.0.2 selfe-made 4.1 The same version and files like point 3. but the images was created on the target machine (x86_64) by me. 4.2 Also the same behavior like point 3. 100% CPU in user mode but still extremely slow.

Do you have any idea how this performance decrease came about in the change from 3.6.3 to 4.0.2 and how we can avoid or at least mitigate it?

noamross commented 4 years ago

I was bitten similarly and suspect the following is the issue with R-ver:

For R-ver, we build with OpenBLAS, which has multi-threaded BLAS capability as of the version available on Ubuntu 20.04, which the 4.0+ images are built on. A result of this is that matrix algebra in single-threaded R sessions will use multiple cores. However, when parallelizing over multiple R sessions, each R session will use multiple cores for matrix algebra, and the resultant traffic will end up slowing the process overall. You'll see 100% CPU usage on multiple cores as all the processes attempt to use all of them at once.

Multithreading for OMP processes (most C-level parallelism in R packages), and for BLAS specifically, can be controlled by setting environment variables OMP_NUM_THREADS and OPENBLAS_NUM_THREADS, respectively. You can also use RhpcBLASctl::blas_set_num_threads(threads) and RhpcBLASctl::omp_set_num_threads(threads). The latter are helpful in RStudio which doesn't pick up environment variables set in the shell.

To diagnose this, you could compare performance and look at CPU usage while running your code with foreach/dopar parallelism set to a single core, and you can look up the low-level parallelism being used with the RhpcBLASctl lookup functions. Whether high- or low- level parallelism is better will depend on the nature of your particular problem.

We've discussed whether to leave this as the default, as it results in a painless speedup for many cases but I suspect others will run into this issue with existing parallel code. I will add a note to the README for now.

eddelbuettel commented 4 years ago

You could add a line to Rprofile.site or Renviron.site to set the env vars to 2, 4, ... or some function of cores.

Someone894 commented 4 years ago

@noamross Thank you very much, I would never guessed something like this on my own :-) I will perform some tests with this new knowledge, but unfortunately I have some courses coming up next week, so it'll take some time before I can come back to you.

Thank you.

Someone894 commented 4 years ago

So, I tested the R-Base 4.0.2 image with an additional bash-script:

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

and now the performance is the same as the old 3.6.3 Version. Thank you very much :-)

Someone894 commented 3 years ago

As it turns out the problem is not completely gone. With the new r-base 4.0.3 image I'm starting to have trouble again.

Since the R-code I'm running can take up to two days of computation I am not measuring the runtime directly (it would take too long). Instead I have a look at the CPU-workload, and especially the user and system designated workload.

workload This images is from the monitoring tool of my server directly after I added the OMP_NUM_THREADS and OPENBLAS_NUM_THREADS variables (10.08.2020). You can see the user workload was quite high in the time frame between 10:30 and 11:30. The one at 12:30 uses both variables and the user workload is around 1%, which is fine. So I thought the problem was fixed.

But with the new 4.0.3 image I see this: workload2 The user workload is at about 8 % to 10 %. which means a slowdown of the overall process of equally 8-10 %, which is quite significant for a multi day job.

My next thought was that there are maybe more variables one can set and I found this answer. I tested it, but with no success. I'm going to try some other things in the next days but I also hope you have some good ideas too. That would be lovely.

How can I get back to the about 1 % user workload from the older R images?

eddelbuettel commented 3 years ago

It is a difficult to say much here. You have not posted reproducibe examples and (something I missed earlier) your post mixes and matches a little too freely. Starting from Debian is different from starting from Ubuntu and both are very different from throwing a different hardware platform in and building from source. Regressions can happen, and are studied but if anything you may have an issue with base R and its BLAS here so you are preparing these detailed notes for the wrong audience. I would recommend to identify one setup and keeping that fixed to the OS, cpu, .... and then to just vary either R version, or the BLAS/LAPACK setup in order to examine variability.

You have full access to all history. For r-base (and other containers from Rocker) the Dockerfile is part of the repo. You will see that r-base (which is the subject of your issue) it does no configuration on its own but simply installs the Debian binary. For which I happen to be the maintainer. And which also had no changes, but also has a full git repo history over here should you need it. I didn't always tag the builds but you can get each version of r-base since 3.6.3 and compare those. Maybe find a test script and seeing how/if these differ would be a start.