J-curve on parallelization speed test for RxODE on Linux cluster - Githubissues

nlmixrdevelopment / RxODE

RxODE is an R package that facilitates easy simulations in R

https://nlmixrdevelopment.github.io/RxODE/

GNU General Public License v3.0

54 stars 14 forks source link

J-curve on parallelization speed test for RxODE on Linux cluster #239

Closed jpryby closed 3 years ago

jpryby commented 4 years ago

I was running a parallelization test for RxODE (v. 0.9.1-4) on my local Windows machine and on a Linux-based cluster (RxODE v. 0.9.1-9 is installed there). When I run it on my Windows machine it doesn't run in parallel no matter how many cores I specify because I use the default settings. Predictably the curve is flat:

However, I expected it to run in parallel on Linux fine because I think in earlier versions it was designed to only run in parallel on Linux (or something along those lines, based on the mc.cores options that were Linux only). Despite this, I get an initial improvement in time, but then a gradual increase after about 4 cores. This explains some issue I had where it took excessively long if I deployed an environment with 30+ cores and let it run all 30. (another example)

It's not the end of the world, and hopefully it's just a version issue, but I figured I'd share.

The test script was based on the microbenchmark up to rxCores() on the RxODE website. https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-speed.html

mattfidler commented 4 years ago

It was liklely not compiled with openmp support. I will look into this later.

On Sat, Jul 11, 2020, 3:45 PM jpryby notifications@github.com wrote:

I was running a parallelization test for RxODE (v. 0.9.1-4) on my local Windows machine and on a Linux-based cluster (RxODE v. 0.9.1-9 is installed there). When I run it on my Windows machine it doesn't run in parallel no matter how many cores I specify because I use the default settings. Predictably the curve is flat: [image: image] https://user-images.githubusercontent.com/67969797/87233348-7d1f4980-c394-11ea-982a-0c2786d3d0ee.png

However, I expected it to run in parallel on Linux fine because I think in earlier versions it was designed to only run in parallel on Linux (or something along those lines, based on the mc.cores options that were Linux only). However, I get an initial improvement in time, but then a gradual increase after about 4 cores. This explains some issue I had where it took excessively long if I deployed an environment with 30+ cores and let it run all 30. [image: image] https://user-images.githubusercontent.com/67969797/87233428-6a594480-c395-11ea-9956-b54a702eea47.png

It's not the end of the world, and hopefully it's just a version issue, but I figured I'd share.

The test script was based on the microbenchmark up to rxCores() on the RxODE website. https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-speed.html

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/RxODE/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWXAV52YJAH4N52ZVDTR3DFNTANCNFSM4OXO2NXQ .

jpryby commented 4 years ago

Great, thanks. No rush, I just thought the J-curve was weird.

mattfidler commented 4 years ago

The J-curve is actually a well known phenomena; If you look at the code of data.table they try to pick the best number of cores to use for a given sort operation. Perhaps this should be done here, but I'm unsure the heuristics of that.

jpryby commented 4 years ago

Thanks, Matt, I wasn't aware. I can only find SO/SE threads on data.table running without parallelization, but I will take your word for it since I haven't looked at the code (I use that package in every script, so good to know). That being said, the documentation I linked suggests it will run in general, and while OpenMP is mentioned elsewhere, it's not listed as a requirement on that page about speeding up RxODE. Maybe an error message should be displayed if it tries to run parallelization and doesn't recognize OpenMP being installed? Clearly it works for Linux, but not the same way as it should with OpenMP. Also, maybe there should be an OpenMP test in the test_install.R script for nlmixr?

mattfidler commented 4 years ago

Here is the code:

https://github.com/Rdatatable/data.table/blob/588e0725320eacc5d8fc296ee9da4967cee198af/src/openmp-utils.c#L60-L70

The throttle depends on the n value. It is timed to sorting and other similar operations. Their analysis says 20 threads is nonsense, so that is their upper bound.

mattfidler commented 4 years ago

NONMEM says parallelization speedup is problem dependent (and it is), and there are cases where more cores cause slower run-times.

mattfidler commented 4 years ago

If you have time and would like to try a variety of problems with a variety of observations per core, we could use the timings to make similar rules as data.table and apply it to RxODE

mattfidler commented 4 years ago

Other unimplemented speedups is to do the harder ODE solving first. By default I would sort this by number of points to solve.

Changing back to the original topic, the only reason why I know some of the internals of data.table is the development version of RxODE just implemented a modified threaded radix sort from data.table.

jpryby commented 4 years ago

If you have time and would like to try a variety of problems with a variety of observations per core, we could use the timings to make similar rules as data.table and apply it to RxODE

I could see about automating some tests, and I'll let you know 👍

mattfidler commented 4 years ago

Things that can be explored before a problem is:

Number of observations
Number of dosing events
Number of individuals (this is what is run in parallel)

mattfidler commented 4 years ago

Updated makevars to use recommended options for OpenMP; This requires all support to be in C. Hence any openmp in C++ cannot be used.

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support

mattfidler commented 4 years ago

The CRAN forces all openmp code to use C++ if any C++ is called in RxODE.

https://stackoverflow.com/questions/54056594/cran-acceptable-way-of-linking-to-openmp-some-c-code-called-from-rcpp

Hence all C code for RxODE using OpenMP was moved to C++

mattfidler commented 3 years ago

I think this is fixed.

jpryby commented 3 years ago

Seems to be, thanks. The latest version is also working on my windows installation, too.

mattfidler commented 3 years ago

Great; we are ramping up to release on CRAN. Hopefully they haven't closed yet :)