kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
110 stars 26 forks source link

Parallelized pomp object #123

Closed munozav closed 3 years ago

munozav commented 3 years ago

Dear Professor King,

Thank you very much again for your valuable program, this time I contact you because I am dealing with parallelisation problems on a Windows 10 computer. I followed the previous issues and I was able to run the program in parallel by creating a Pomp object with cfile and then parallelising the trajectory.

However, at the moment I am running a kind of SIR model where I need to run different MDA policies which I pass through the "covar" parameter in the pomp function. I tried including the pomp function inside the parallelisation with no success.

So I'm wondering if there is a function that I can use to override the covar parameter in the pomp object. In that way I could create the pomp object and then parallelise the covar modification along with the path function.

If you can give me a light, I would really appreciate it.

Thanks again for all, and have a nice day!!

kingaa commented 3 years ago

Have you tried creating an array of pomp objects outside of the parallel loop, differing in terms of the covariates? Each one of the parallel jobs could select its own pomp object from the array.

Alternatively, if there are not too many covariates, you could pass a comprehensive covariate table and control which covariates are actually used by means of a parameter.

I would be remiss were I not to point out that, once you are into the realm of parallel computing, it's almost surely time to start thinking about using an operating system better suited to high-performance scientific computing (i.e., linux).

munozav commented 3 years ago

Thanks for your prompt reply, professor, and I totally agree with you about the inefficiency of the Windows system, but unfortunately my only option is a Windows computer.

I tried to create a list with the pomp objects outside the parallel loop, but it doesn't work for all cases. Then I created pomp objects for each case k and gave them a specific k name. For example

    pomp.det <-pomp (data = data.frame (dummy = NA, time = time.window),
                                      times = "time",
                                      t0 = start time,
                                      covar = politics,
                                      skeleton = vectorfield (Csnippet (vl.code.det)),
                                      rinit = Csnippet (vl.init),
                                      statenames = statenames.det,
                                      paramnames = names (param.values.det),
                                      cdir = output.dir, cfile = paste0 ("pomp.det _", k))

and then in the parallel loop I allocated each pomp.det_k in the global.enviroment and called them with the trajectory function. Because I have to generate a lot of scenarios, I also tried to parallelize the creation of pomp.det objects, I noticed that the cases that worked saved two files "pomp.det_k.c" and "pomp.det_k.dll", and when it does not work it is because the file "pomp.det_k.dll" is missing.

I really appreciate your support since this is a Windows problem, not your package issue, but if you know of any way to force the creation of the .dll file in Windows, please let me know.

Thanks again for everything, your package is fantastic.

kingaa commented 3 years ago

If I understand you correct @munozav, you had partial success when you created the pomp objects outside of the parallel loop. Can you be more specific about what worked and what didn't?

With respect to creating the pomp objects in parallel, you are recreating the conditions for the initial failure. You should be able to create the pomp objects according for each case if you do it serially, i.e., use %do% instead of %dopar% (or registerDoSEQ instead of registerDoParallel or equivalent).

So you understand, the issue is that building a pomp object containing C snippets causes a file to be built, which is then compiled using R CMD SHLIB. For some reason, this does not work well on Windows machines if it is done using multiple parallel processes. I would love to understand why, but I don't have the ability to perform the necessary tests on a Windows machine (nor the time or the Windows expertise). So I recommend doing all the compilation of C snippets serially, in an R session, and then doing the parallel computations. FWIW, if you keep the files 'pomp_det_k.c' and 'pomp_det_k.dll' around,, and save the pomp objects, you can re-use them in later sessions.

munozav commented 3 years ago

Dear Professor King, sorry for my bad English, what I tried to explain to you was that when I created the pomp objects in parallel, out of the k cases, not all of them failed to produce the pomp object which was then successfully passed to the trajectory function in parallel. But I am not able to determine why certain cases were successful, I couldn't find a pattern in the execution, it seems like it's random.

In the end I took your advice to create the pomp object in series and saved it to a list, then ran the path function in parallel.

Once again, thank you very much for your kind support and have a nice evening !!

kingaa commented 3 years ago

Your English was fine: I understood you correctly. The random failure happens when Windows tries to compile codes in parallel. I don't understand precisely why.

Did creating the pomp object in series, saving it to a list, and then running the trajectory function in parallel work for you?

munozav commented 3 years ago

Yes, It works, and everything was thanks to you!!!! Infinitely thanks for all!!!! have a nice day!!