kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
114 stars 27 forks source link

Using R function instead of CSnippet #20

Closed alejob closed 8 years ago

alejob commented 8 years ago

Hi, I'm trying to replicate this tutorial http://kingaa.github.io/pomp/vignettes/getting_started.html, but instead of use CSnippet I want to use R function. I have not found any example of this. I want to do this to call an external binary from R (with system()) and to understand better the pomp's implementation.

So, I've replaced this function in the tutorial:

step.fun <- Csnippet("
      double dW = rnorm(0,sqrt(dt));
      N += r*N*(1-N/K)*dt+sigma*N*dW;
    ")

for this

test.fun <- function(x,t,params,delta.t,...){
    dW <- rnorm(0, sqrt(t))
     N <- N + r*N*(1-N/K)*dt+sigma*N*dW
}

and then like in the tutorial

parus <- pomp(data=parus.dat,time="year",t0=1959, 
        rprocess=euler.sim(step.fun=test.fun,delta.t=1/305),
        statenames="N",paramnames=c("r","K","sigma"))

and simulate simStates <- simulate(parus,nsim=10,params=c(r=0.2,K=200,sigma=0.5,N.0=200),states=TRUE)

but I have this error

Error in (function (x, t, params, delta.t, ...) : object 'N' not found Error: ‘simulate’ error

and of course, N is not declare in the R function, but for the CSnippet pomp set that variable.

If you have any example of suggestion of how to do this I'll be very grateful.

p.d: I've been using this page to try the implementation http://www.rdocumentation.org/packages/pomp/functions/plugins

kingaa commented 8 years ago

The reason you see Csnippet featured prominently in the documentation and very little on the R function option is that one can experience up to 100-fold speed-ups using compiled code. One might imagine that this advantage is balanced by the fact that it is, for most people, easier to write R code than C code. It is, however, difficult to anticipate all the variants one can come up with in R and for me to write fast code, I must do so. The result is that if one uses R functions to specify one's model, not only will these be slow, but it will be nontrivial to write them so that they conform to pomp's expectations. The upshot is that there is essentially no reason to use R functions under ordinary circumstances.

In your case, however, your goal is to use an external executable binary. I will not ask you why (though I am curious), but it is clear that these are not ordinary circumstances. In your case, you may find it useful to know that R makes it quite easy to link to compiled code (see ?".C") and pomp does too (see ?pomp: one has the option of supplying any of the model functions as the name of a compiled native routine in a library linked into the running R session).

So let me try to dissuade you from your proposed course of action and urge you instead to use the provided facilities. As Picasso said, "Learn the rules like a pro, so you can break them like an artist". Having made my attempt to dissuade you, I will now answer your question as posed.

The problem you run into is simple: the parameters r, K, sigma and the state variable N are not defined in the function you wrote. They are defined in params and x, respectively. So if you instead do

test.fun <- function (x, t, params, delta.t, ... ) {
  dW <- rnorm(n = 1, mean = 0, sd = sqrt(delta.t))
  N <- x["N"] + params["r"] * x["N"] * (1 - x["N"]/params["K"]) * delta.t + params["sigma"] * dW
  c(N=unname(N))
}

followed by

parus <- pomp(data = parus.dat, time = "year", t0 = 1959, 
        rprocess = euler.sim(step.fun = test.fun, delta.t = 1/365))

you should obtain what you desire.

Two things to note:

  1. The statenames and paramnames are only needed if one is using Csnippets, so they are not needed here.
  2. One has to be careful that test.fun does not change the names of x. Hence the unname() call.

I don't have time at the moment to test these lines. Please do so and respond to let me know if they work. I can correct any mistakes in this issue before closing it.

alejob commented 8 years ago

First of all, thanks for the quick answer and recommendations.

Well, I didn't want to bore you explaining what I'm trying to do in the first post, but if you are curious, I'll tell you. We are developing a program in my lab called PISKa(documentation), based on KaSim, this program mainly run simulations based on agents/rules. This program is an implementation of the formal language Kappa and it uses the Gillespie algorithm in the back-end, so, we are replacing a system based on differential equations for a system based on agents/rules. In our system based on agents/rules we have rates for each rule, this rates are equivalent to the parameters on a differential equation.

So, what I want to do with PISKa and pomp?, I'd like to connect PISKa with pomp (I'm figuring out how) to use IF2 or something similar to adjust my data to my simulation, ie, find the rates of my rules. My idea is to run PISKa (rprocess), send the result of a single simulation to pomp, and that pomp send me back the new rates for my rules (applying the IF2), and repeat this to maximize the likelihood. I'm modelling a spread disease of Ebola (SEIRD model) with the help of Jonathan Dushoff, so I'll adjust the spread disease data to my model.

For this reason, apparently, I can't use a CSnippet, because I want to call an external program, PISKa, written in OCaml (strange language, but we are rewriting the code in C++). I'd like to use the provided facilities as you say, so I saw your recommendations, ?".C", but apparently I just can use Fortran or C code in R. In pomp, if I want to link a compiled native routine, I understand, from documentation, that I can do that in the rmeasure, right?, but is that possible in the rprocess?, should it be a compiled native routine in C++?, or it could be in an other language?, any other suggestion to try this connection with PISKa and pomp?

Regarding the question, now it works, but the results are a little different between the simulation with the CSnippet and the R function. I run both simulations many times, and all the graphs are very similar to: CSnippet: graph , y axis from 0 to 600 R function: graph, y axis from ~198 to ~204

is there anything missing in the R function?

kingaa commented 8 years ago

Interesting. In principle, it seems a good plan. A lot will depend on the computational expense associated with your simulations. The effective methods implemented by pomp are all based on Monte Carlo integration or Monte Carlo sampling, and hence require large number of simulations. Just how many is "large" will of course depend, and the computational cost of each simulation will depend on your model and your code.

As for your specific questions: it is possible to provide names of native routines for the rmeasure and the dmeasure, and for the various plug-ins that can be used to specify rprocess. For an rprocess that doesn't conform to the assumptions of any of the plugins, the easiest thing to do is to write an R function that calls either .C, .Fortran, or an arbitrary executable via system. It sounds like this is your plan.

alejob commented 8 years ago

Thanks for the answers ;)

I'm closing this issue.