kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
114 stars 27 forks source link

How to shorten the simulation time on the basis of credible results #17

Closed Spatial-R closed 8 years ago

Spatial-R commented 8 years ago

I have run a simulation using the parallel computation (doMC package) on a cloud server (Intel Xeon CPU,Cores:16), however, it haven't be finished for nearly four days. The dataset in the simulation is the weekly case for two years. The Np and Nmif argument in mif function is 2000 and 100(I don't know whether it is enough to get the credible result), respectively. How could i make the simulation faster? You have developed the mpifarm package for parallel computation and how about the efficiency compared to the doMC package? Another question is that how do i set the initial value of every parameter? In my situation, there are nearly 8 parameters which i can't decide the potential range of parameters because no reference have reported them. Some papers have used the traj.match function to get the potential value for parameters, however, the results are also related to the original parameter value during the process to build the pomp model. What is the standard way to set the parameter value before the simulation?

kingaa commented 8 years ago

There are numerous reasons why your computations might be taking so long. Because you provide no code, I am left to guess as to the causes, which is a waste of my time and yours. Here are a few questions that pop into my head:

I do not recommend using mpifarm.

As for finding good guesses for parameters: there is no general procedure. mif2 often does a surprisingly good job of estimating them, however, even from quite bad starting guesses.

Spatial-R commented 8 years ago

Sorry! The codes and dataset are here. And i thanks for your kindly response.

kingaa commented 8 years ago

I slightly modified your codes (just to simplify them really, see below) and run the following in 2.2 min on my 40-core Xeon E5-2650 2.3GHz machine. I'm doing 100 mif iterations with 1000 particles think this is about 1/10 the time it would take were I to follow you in doing 200 mif iterations with 5000 particles each. [Computing time is roughly proportional to the product of mif iterations and particles.] If, as you say, your computations have not finished in >1 hr, I would say something must have gone wrong with the computations. This is likely to be something highly specific to your own situtation. What kind of computers are you running? What versions of R and pomp?

library(pomp)
library(magrittr)
load("dataset.RData")
rproc <- Csnippet("
                  double Beta, lambda, dW, airplt, logbeta1;
                  double rate[7], trans[7];

                  logbeta1 = log(R0*(1.0-exp(-(gamma+mu)*dt))/dt);    // caculate the log beta based on R0

                  //  seasonality for transmission
                  int nseas=3;
                  int k;
                  const double *seas = &seas1;
                  const double *logbeta = &logbeta1;
                  for (k = 0, Beta = 0; k < nseas; k++) Beta += logbeta[k] *seas[k];

                  //  air polutant on  transmission
                  airplt = co * pamco + no2 * pamno2+ o3 * pamo3+ pm10 * pampm10+ pm25 *pampm25 + so2 * pamso2 ;
                  Beta =exp(Beta * airplt);

                  dW = rgammawn(sigmaSE,dt);   // white noise (extrademographic stochasticity)

                  // expected force of infection
                  lambda = (Beta*pow(I+iota,alpha)/pop)*(dW/dt);   // mixing pattern and immigration 

                  double   birthnum    = rpois(birthrate*dt);            // births are Poisson
                  double   vaccinum    = rpois(vacci*dt);      // vaccination are Poisson

                  rate[0] = lambda * dt;  // stochastic force of infection
                  rate[1] = mu * dt;             // natural S death
                  rate[2] = sigma * dt;        // rate of ending of latent stage
                  rate[3] = mu * dt;             // natural E death
                  rate[4] = gamma * dt;        // recovery
                  rate[5] = mu * dt;             // natural I death
                  rate[6] = mu * dt;             // natural R death

                  // transitions between classes
                  reulermultinom(2,S,&rate[0],dt,&trans[0]);
                  reulermultinom(2,E,&rate[2],dt,&trans[2]);
                  reulermultinom(2,I,&rate[4],dt,&trans[4]);
                  reulermultinom(1,R,&rate[6],dt,&trans[6]);

                  S += birthnum - trans[0] - trans[1] - vaccinum ;
                  E += trans[0]  - trans[2] - trans[3];
                  I += trans[2]  - trans[4] - trans[5];
                  R += trans[4]  - trans[6] + vaccinum; //pop-S-E-I;
                  W += (dW- dt)/sigmaSE;  // standardized i.i.d. white noise
                  C += trans[4];           // true incidence
                  ")

Csnippet("
         double Beta, lambda, airplt, logbeta1;
         double rate[7], trans[7];

         logbeta1 = log(R0*(1.0-exp(-(gamma+mu))));    // caculate the log beta based on R0

         //  seasonality for transmission
         int nseas=3, k;
         const double *seas = &seas1;
         const double *logbeta = &logbeta1;
         for (k = 0, Beta = 0; k < nseas; k++) Beta += logbeta[k] *seas[k];

         //  air polutant on  transmission
         airplt = co * pamco + no2 * pamno2+ o3 * pamo3+ pm10 * pampm10+ pm25 *pampm25 + so2 * pamso2 ;
         Beta =exp(Beta * airplt);

         // expected force of infection
         lambda = Beta*pow(I+iota,alpha)/pop;   // mixing pattern and immigration 

         double  birthnum    = birthrate;            // births are Poisson
         double  vaccinum    = vacci;      // vaccination are Poisson

         rate[0] = lambda  ;  // stochastic force of infection
         rate[1] = mu ;             // natural S death
         rate[2] = sigma  ;        // rate of ending of latent stage
         rate[3] = mu  ;             // natural E death
         rate[4] = gamma  ;        // recovery
         rate[5] = mu ;             // natural I death
         rate[6] = mu ;             // natural R death

         // transitions between classes

         trans[0] = S * rate[0];
         trans[1] = S * rate[1];
         trans[2] = E * rate[2];
         trans[3] = E * rate[3];
         trans[4] = I * rate[4];
         trans[5] = I * rate[5];
         trans[6] = R * rate[6];

         DS = birthnum  - trans[0] - trans[1] - vaccinum ;
         DE = trans[0]  - trans[2] - trans[3];
         DI = trans[2]  - trans[4] - trans[5];
         DR = trans[4]  - trans[6] + vaccinum;
         DC = trans[4];           // true incidence
         ") -> skel

initlz <- Csnippet("
                   double m = pop/(S_0+E_0+I_0+R_0);
                   S = nearbyint(m*S_0);
                   E = nearbyint(m*E_0);
                   I = nearbyint(m*I_0);
                   R = nearbyint(m*R_0);
                   W = 0;
                   C = 0;
                   ")

dmeas <- Csnippet("
                  double m = rho*C;
                  double v = m*(1.0-rho+psi*psi*m);
                  double tol = 1.0e-18;
                  if (cases > 0.0) {
                  lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
                  } else {
                  lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
                  }
                  ")

rmeas <- Csnippet("
                  double m = rho*C;
                  double v = m*(1.0-rho+psi*psi*m);
                  double tol = 1.0e-18;
                  cases = rnorm(m,sqrt(v)+tol);
                  if (cases > 0.0) {
                  cases = nearbyint(cases);
                  } else {
                  cases = 0.0;
                  }
                  ")

toEst <- Csnippet("
                  Tmu = log(mu);
                  Tsigma = log(sigma);
                  Tgamma = log(gamma);
                  Talpha = log(alpha);
                  Tiota = log(iota);
                  Trho = logit(rho);
                  Tpsi = log(psi);
                  Tpamso2 = log(pamso2);
                  Tpamno2 = log(pamno2);
                  Tpampm10 = log(pampm10);
                  Tpampm25 = log(pampm25);
                  Tpamco = log(pamco);
                  Tpamo3 = log(pamo3);
                  TsigmaSE = log(sigmaSE);
                  TR0 = log(R0);
                  to_log_barycentric (&TS_0, &S_0, 4);
                  ")

fromEst <- Csnippet("
                    Tmu = exp(mu);
                    Tsigma = exp(sigma);
                    Tgamma = exp(gamma);
                    Talpha = exp(alpha);
                    Tpsi = exp(psi);
                    Tiota = exp(iota);
                    Trho = expit(rho);
                    Tpamso2 = exp(pamso2);
                    Tpamno2 = exp(pamno2);
                    Tpampm10 = exp(pampm10);
                    Tpampm25 = exp(pampm25);
                    Tpamco = exp(pamco);
                    Tpamo3 = exp(pamo3);
                    TsigmaSE = exp(sigmaSE);
                    TR0 = exp(R0);
                    from_log_barycentric (&TS_0, &S_0, 4);
                    ")

c(R0=5,gamma=1/20,sigma=1/15,mu=1.28e-04,sigmaSE=0.512,
  rho=0.8, alpha= 0.976, iota=2.9, psi=0.116,
  pampm10=0.01,pampm25=0.002,pamso2=0.0002,pamno2=0.0004,pamo3=0.00032,pamco=0.0056,
  S_0=0.0297,I_0=5.14e-05,E_0=5.17e-05
) -> theta
theta["R_0"]<-1-theta["I_0"]-theta["E_0"]-theta["E_0"]

varicella.hz %>%
  pomp(t0=varicella.hz$week[1],
       time="week",
       params=theta,
       rprocess=discrete.time.sim(step.fun=rproc,delta.t=1),
       skeleton=skel,skeleton.type="map", skelmap.delta.t=1,
       initializer=initlz,
       dmeasure=dmeas,
       rmeasure=rmeas,
       covar=covar,
       toEstimationScale=toEst,
       fromEstimationScale=fromEst ,
       tcovar="week",
       zeronames=c("C","W"),
       statenames=c("S","E","I","R","C","W"),
       paramnames=c("R0","mu","sigma","gamma","alpha","iota","psi",
                    "rho","sigmaSE","pampm10","pampm25","pamso2","pamno2","pamo3","pamco",
                    "S_0","E_0","I_0","R_0")
  ) -> origin.model

require(foreach);require(doMC); 
registerDoMC(14)
set.seed(998468235L,kind="L'Ecuyer")

gusses.parameter <- rbind( 
  R0=c(0,20),
  mu=c(0.005,0.005),
  sigma=c(0,20),
  gamma=c(0,20),
  alpha=c(0.976,0.976),
  iota=c(0,5),
  rho=c(0,1),
  sigmaSE=c(0,1),
  psi=c(0,1),
  pampm10=c(0,0.1),
  pampm25=c(0,0.1),
  pamso2=c(0,0.1),
  pamno2=c(0,0.1),
  pamco=c(0,0.1),
  pamo3=c(0,0.1),
  S_0=c(0,0.03),
  I_0=c(0,0.0001),
  E_0=c(0,0.0001)
)

tic <- Sys.time()

global.result <- foreach(i=1:28, .combine=rbind,
                         .packages=c("pomp","magrittr"),.errorhandling="pass",
                         .inorder=FALSE,
                         .options.multicore=list(preschedule=FALSE,set.seed=TRUE)
) %dopar%  {
  start.parameter <- apply(gusses.parameter,1,function(x)runif(1,x[1],x[2]))
  start.parameter[["R_0"]] <- 1-start.parameter[["S_0"]]-start.parameter[["E_0"]]-start.parameter[["I_0"]]
  mif2(origin.model, start=start.parameter, Np=1000,
       Nmif=100,
       cooling.type="geometric",cooling.fraction.50=0.1, transform=TRUE,
       rw.sd=rw.sd( 
         R0=0.02,mu=0.02,sigma=0.02,gamma=0.02,
         alpha=0.02,iota=0.02, rho=0.02,
         sigmaSE=0.02, psi=0.02, pampm10=0.02,
         pampm25=0.02,pamso2=0.02,pamno2=0.02,
         pamco=0.02,pamo3=0.02,
         I_0=ivp(0.02),
         S_0=ivp(0.02),
         E_0=ivp(0.02),
         R_0=ivp(0.02)
       )) -> mf   ##  %>%  mif2() 
  pf <- replicate(5, pfilter(mf, Np = 5000))
  ll <- sapply(pf,logLik)
  ll <- logmeanexp(ll, se = TRUE)
  nfail <- sapply(pf,getElement,"nfail")
  data.frame(as.list(coef(mf)),
             loglik = ll[1],
             loglik = ll[2],
             nfail.min = min(nfail),
             nfail.max = max(nfail))
}

toc <- Sys.time()
toc-tic
Spatial-R commented 8 years ago

Thanks! I have changed the structure of the pomp model based on the codes above and come across another question: Error : in ‘pfilter’: ‘dmeasure’ returns non-finite value
I guess there are negative or non-integer state variables in the simulations. Therefore, i want to make sure the result of item (pm10 * pam3pm10+ temp *pam3temp) in the rprocess is no-negative, where the pm10 and temp are covariates and pam3pm10 and pam3temp are the parameters.
How could i solve the problem? The codes and dataset was here

kingaa commented 8 years ago

Note that I did not change the structure of your model codes. Nor am I competent to do so: I don't know what you're doing or why. I suggest you contemplate the structure of your model and see if there are constraints on the parameters, beyond those that you have implemented already, that make sense.