Closed tiboloic closed 4 years ago
If you need more than 48 threads you must set e.g.:
#define CPPAD_MAX_NUM_THREADS 100
right before
#include <TMB.hpp>
If that doesn't solve the problem you may try reducing memory by disabling parallel taping:
TMB:::config(tape.parallel=FALSE)
I didn't really answer the question:
is there an easy way to limit the number of cores used when running a parallel template ?
From R you can set e.g.
TMB:::openmp(40)
Or you can use the command line flag OMP_NUM_THREADS
.
Brilliant ! my model is running. Many thanks for your help.
May I ask further questions on the performance/peak memory trade-off ?
What is the likely effect on peak memory usage of using option atomic=FALSE in MakeADFun() ? I am asking this because for my model most of the taping time and memory usage seems to be dedicated to constructing atomic D_lgamma (my likelihood is a multinomial sampling)
Is the peak memory usage when taping likely to increase linearly with the number of cpus when using parallel taping ? more generally, knowing peak memory usage using only 1 cpu, is there a rule of thumb to estimate peak memory usage using 100 cpus?
Would a serialization strategy be viable ? such as:
All the best
1. What is the likely effect on peak memory usage of using option atomic=FALSE in MakeADFun() ? I am asking this because for my model most of the taping time and memory usage seems to be dedicated to constructing atomic D_lgamma (my likelihood is a multinomial sampling)
I have similar problems with a dataset ~1 million and constructing D_lgamma used up 40-50Gb. @tiboloic, had you found something regarding this issue?
The atomic
argument to MakeADFun
doesn't have any effect and will be deprecated in the future.
@kklot Did you try to disable parallel taping as described above? If this sorts out the memory issues you may want to consider using PARALLEL_REGION
macro instead of the parallel_accumulator
- see the example here. The downside is that you have to mark all accumulation and thread local temporaries manually (in contrast to parallel_accumulator
which works automatically but is less efficient).
yes - disable parallel taping has the memory issues sorted. Thanks a lot for pointing me to PARALLEL_REGION
, did not know about it.
Hi,
I am running a big model that requires lots of memory on Google cloud platform. Model runs fine of 40 cores nodes but crashes on 80 cores nodes.
On 80 cores, using exactly same software
is there an easy way to limit the number of cores used when running a parallel template ?
Unfortunately the amount of memory needed to run my model is only available on 80+ cores nodes.
Thanks for the great work