Closed jpryby closed 3 years ago
It was liklely not compiled with openmp support. I will look into this later.
On Sat, Jul 11, 2020, 3:45 PM jpryby notifications@github.com wrote:
I was running a parallelization test for RxODE (v. 0.9.1-4) on my local Windows machine and on a Linux-based cluster (RxODE v. 0.9.1-9 is installed there). When I run it on my Windows machine it doesn't run in parallel no matter how many cores I specify because I use the default settings. Predictably the curve is flat: [image: image] https://user-images.githubusercontent.com/67969797/87233348-7d1f4980-c394-11ea-982a-0c2786d3d0ee.png
However, I expected it to run in parallel on Linux fine because I think in earlier versions it was designed to only run in parallel on Linux (or something along those lines, based on the mc.cores options that were Linux only). However, I get an initial improvement in time, but then a gradual increase after about 4 cores. This explains some issue I had where it took excessively long if I deployed an environment with 30+ cores and let it run all 30. [image: image] https://user-images.githubusercontent.com/67969797/87233428-6a594480-c395-11ea-9956-b54a702eea47.png
It's not the end of the world, and hopefully it's just a version issue, but I figured I'd share.
The test script was based on the microbenchmark up to rxCores() on the RxODE website. https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-speed.html
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/RxODE/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWXAV52YJAH4N52ZVDTR3DFNTANCNFSM4OXO2NXQ .
Great, thanks. No rush, I just thought the J-curve was weird.
The J-curve is actually a well known phenomena; If you look at the code of data.table they try to pick the best number of cores to use for a given sort operation. Perhaps this should be done here, but I'm unsure the heuristics of that.
Thanks, Matt, I wasn't aware. I can only find SO/SE threads on data.table running without parallelization, but I will take your word for it since I haven't looked at the code (I use that package in every script, so good to know). That being said, the documentation I linked suggests it will run in general, and while OpenMP is mentioned elsewhere, it's not listed as a requirement on that page about speeding up RxODE. Maybe an error message should be displayed if it tries to run parallelization and doesn't recognize OpenMP being installed? Clearly it works for Linux, but not the same way as it should with OpenMP. Also, maybe there should be an OpenMP test in the test_install.R script for nlmixr?
Here is the code:
The throttle depends on the n
value. It is timed to sorting and other similar operations. Their analysis says 20
threads is nonsense, so that is their upper bound.
NONMEM says parallelization speedup is problem dependent (and it is), and there are cases where more cores cause slower run-times.
If you have time and would like to try a variety of problems with a variety of observations per core, we could use the timings to make similar rules as data.table and apply it to RxODE
Other unimplemented speedups is to do the harder ODE solving first. By default I would sort this by number of points to solve.
Changing back to the original topic, the only reason why I know some of the internals of data.table is the development version of RxODE
just implemented a modified threaded radix sort from data.table.
If you have time and would like to try a variety of problems with a variety of observations per core, we could use the timings to make similar rules as data.table and apply it to
RxODE
I could see about automating some tests, and I'll let you know 👍
Things that can be explored before a problem is:
Updated makevars to use recommended options for OpenMP; This requires all support to be in C. Hence any openmp in C++ cannot be used.
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support
The CRAN forces all openmp code to use C++ if any C++ is called in RxODE.
Hence all C code for RxODE using OpenMP was moved to C++
I think this is fixed.
Seems to be, thanks. The latest version is also working on my windows installation, too.
Great; we are ramping up to release on CRAN. Hopefully they haven't closed yet :)
I was running a parallelization test for RxODE (v. 0.9.1-4) on my local Windows machine and on a Linux-based cluster (RxODE v. 0.9.1-9 is installed there). When I run it on my Windows machine it doesn't run in parallel no matter how many cores I specify because I use the default settings. Predictably the curve is flat:
However, I expected it to run in parallel on Linux fine because I think in earlier versions it was designed to only run in parallel on Linux (or something along those lines, based on the mc.cores options that were Linux only). Despite this, I get an initial improvement in time, but then a gradual increase after about 4 cores. This explains some issue I had where it took excessively long if I deployed an environment with 30+ cores and let it run all 30. (another example)
It's not the end of the world, and hopefully it's just a version issue, but I figured I'd share.
The test script was based on the microbenchmark up to rxCores() on the RxODE website. https://nlmixrdevelopment.github.io/RxODE/articles/RxODE-speed.html