kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
111 stars 27 forks source link

Multitype epidemic models - state/parameter vectors or matrices #11

Closed christelpei closed 4 years ago

christelpei commented 8 years ago

I am interested in multi-type epidemic models in which the states (statenames) will for example have to be subdivided into age groups. It would therefore be helpful to have vectors with the numbers of susceptible hosts in certain age classes S[0]…S[n](as well as infected etc.). Is there a way to formalize this within pomp or do I have to define states S0, S1,…,Sn manually?

Having this in mind, is it also possible to read data from file in the Csnippets, in particular for initialization or parameter choice – and how do I mask quotation marks in the file handles? Is it otherwise possible to read a parameter matrix (e.g. the next generation matrix) within in R and use it within the Csnippets (and how do I deal with the different offset in array indices starting from 0 or 1 in C and R, respectively)?

kingaa commented 8 years ago

With respect to the first question, this is exactly FAQ 3.1. Have a look there. Essentially, one has to give the name of the first element to statenames and to ensure (by way of the initializer) that your elements are contiguous.

With respect to the second question, it is neither necessary nor a good idea to do file IO within a Csnippet. If you have a matrix of parameters, you can read it into R in the usual way and pass it to the components of your pomp in one of two ways:

  1. as part of the params argument.
  2. by passing it as an extra (named) argument to pomp. This will result in its being stored in the userdata slot of the pomp object and being made available to every Csnippet.

To access an element of the userdata slot, you'll need to use one of the functions

const SEXP get_pomp_userdata (const char *name);
const int *get_pomp_userdata_int (const char *name);
const double *get_pomp_userdata_double (const char *name);

defined in the header file pomp.h. To look at the latter, do

  file.show(system.file("include/pomp.h",package="pomp"))

in an R session.

christelpei commented 8 years ago

Thanks a lot for the fast response and for pointing me to the helpful FAQs that I had overlooked. I think I have moved a step forward for the vector of state variables (though I will still have to test out how to deal best with its components) , but I am still struggeling with the parameters

I need to have access to a contact matrix to parameterize the model which I already have transformed to a vector: C = matrix(scan('contacts_ger_small.txt'),ncol=5,byrow=TRUE) Cvec = as.vector(C)

I have tried out various ways to "fill" the constructor:

pomp(data, time="day", t0=0, rprocess=euler.sim(sirs_step,delta.t=1/6),initializer=sirs_init, paramnames=c("Beta","gamma", "mu", "N","agegroups",sprintf("Cv%d",1:25)),statenames=stat)-> sirs -> this seems to work with: const double *Cvlocal = &Cv1; in the sirs_step but how do I assign Cvec to Cv?

pomp(data, time="day", t0=0, rprocess=euler.sim(sirs_step,delta.t=1/6),initializer=sirs_init, paramnames=c("Beta","gamma", "mu", "N","agegroups",sprintf("Cv%d",1:25)),params=c(Cv=Cvec),statenames=stat)-> sirs

-> this seems to work as well, but if I try to simulate it, it says "variable 'Cv1' not found among the paramnames"

pomp(data, time="day", t0=0, rprocess=euler.sim(sirs_step,delta.t=1/6),initializer=sirs_init, paramnames=c("Beta","gamma", "mu", "N","agegroups","Cv"),params=c(Cv=Cvec),statenames=stat)-> sirs

-> this seems to make things even worse...

pomp(bsflu, time="day", t0=0, rprocess=euler.sim(sirs_step,delta.t=1/6),initializer=sirs_init, paramnames=c("Beta","gamma", "mu", "N","agegroups","Cv"),statenames=stat, Cv=Cvec)-> sirs

-> is this how you fill a userdata slot? Apparently not...But I have not found much on how to do this.

kingaa commented 8 years ago

I think it may be easier than you are imagining. Can you provide your code in a form that can be run? I will edit it to show you a solution. (Please provide a full set of codes that I can run. If you want to replace any sensitive data with fake data, that's fine.)

christelpei commented 8 years ago

Thanks again for this helpful adivse - carefully preparing the code in a form to provide it to you made me notice a few inconsistencies that I had not seen before - and it worked! :-)

However, I encountered another rather peculiar problem with the following code (usage of your bsflu data does not make an awfully lot of sense - it is just to fill the constructor):

C = matrix(scan('contacts_ger_small.txt'),ncol=5,byrow=TRUE)
Cvec = as.vector(C)
sir_step <- Csnippet("
  double *X = &X1;

  int i,j;
  int n = agegroups; 
  //printf(\"%d\",1);

  //contact matrix
  const double *Cvlocal = &Cv1;
  double CM[n][n];
  for (i = 0; i< n; i++)
    {
      for(j=0; j<n; j++)
      {
        CM[i][j] = Cvlocal[i*n+j];
      }
    }
  double lambda[n]; //force of infection

  double dN_SI[n];
  double dN_IR[n];

  //epidemic process 
  for(i = 0; i < n; i++)
   {
     for(j = 0; j < n; j++)
      {
        lambda[i] += CM[i][j]*X[n+j];
      }

     dN_SI[i] = rbinom(X[i],1-exp(-Beta*lambda[i]/N*dt));
     dN_IR[i] = rbinom(X[n+i],1-exp(-gamma*dt));
     X[i] +=  - dN_SI[i];           //S[i]
     X[n+i] += dN_SI[i] - dN_IR[i];     //I[i]
     X[2*n+i] +=  dN_IR[i];     //R[i]
     X[3*n] +=   dN_IR[i]; 
   }
")

sir_init <- Csnippet("
  double *X = &X1;
  int n = agegroups;
  for (int i = 0; i < n; i++) 
   {
    X[i] = 499;
    X[n+i] = 1;
    X[2*n+i] = 0;
  }
  X[3*n]= 0;
")

dmeas <- Csnippet("lik = dbinom(B,X16,rho,give_log);")
rmeas <- Csnippet("B = rbinom(X16,rho);")

statX = c(sprintf("X%d",1:16))
pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N","agegroups",sprintf("Cv%d",1:25))) -> sir 

sims <- simulate(sir,params=c(Beta=0.0002,gamma=0.01,rho=0.8,N=2500,agegroups=5,Cv=Cvec),nsim=5,
                 as.data.frame=TRUE,include.data=TRUE)  

This code will fill X1...X16 in the simulations with NaN - unless you de-comment the printf or directly assign int n = 5; .

C is just a 5x5 matrix of contacts as attached contacts_ger_small.txt

kingaa commented 8 years ago

Excellent! Now I can really see where we are. Check out the following

require(pomp)

C <- matrix(scan('contacts_ger_small.txt'),ncol=5,byrow=TRUE)

sir_step <- Csnippet("
  double *X = &X1;
  int i,j;
  int n = agegroups; 
  const double *Cvlocal = &Cv1;
  double *Slocal = &X1;
  double *Ilocal = Slocal+n;
  double *Rlocal = Ilocal+n;
#define CM(J,K) Cvlocal[(J)+n*(K)]
#define S(K) Slocal[(K)]
#define I(K) Ilocal[(K)]
#define R(K) Rlocal[(K)]
  double dN_SI[n];
  double dN_IR[n];
  double lambda;

  // epidemic process 
  for (i = 0; i < n; i++) {
    lambda = 0.0;
    for (j = 0; j < n; j++)
      lambda += CM(i,j)*I(j);
    dN_SI[i] = rbinom(S(i),1-exp(-Beta*lambda/N*dt));
    dN_IR[i] = rbinom(I(i),1-exp(-gamma*dt));
    S(i) -=   dN_SI[i];
    I(i) +=   dN_SI[i] - dN_IR[i];
    R(i) +=   dN_IR[i];
    X[3*n] += dN_IR[i]; 
   }
")

sir_init <- Csnippet("
  double *X = &X1;
  int n = agegroups;
  for (int i = 0; i < n; i++) 
   {
    X[i] = 499;
    X[n+i] = 1;
    X[2*n+i] = 0;
  }
  X[3*n]= 0;
")

dmeas <- Csnippet("lik = dbinom(B,X16,rho,give_log);")
rmeas <- Csnippet("B = rbinom(X16,rho);")

statX = c(sprintf("X%d",1:16))

base_url <- "http://kingaa.github.io/sbied/"
url <- paste0(base_url,"data/bsflu_data.txt")
bsflu <- read.table(url)

pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N","agegroups",sprintf("Cv%d",1:25))
) -> sir 

sims <- simulate(sir,params=c(Beta=0.0002,gamma=0.01,rho=0.8,N=2500,
                              agegroups=5,Cv=C),nsim=5,
                 as.data.frame=TRUE,include.data=TRUE)  

require(ggplot2)
require(magrittr)
require(reshape2)

sims %>%
  melt(id=c("sim","time")) %>%
  ggplot(aes(x=time,group=sim,color=sim,y=value))+
  geom_line()+
  facet_wrap(~variable,scales="free_y")

The issue appeared to be that you were not resetting lambda[i] at each time step. The consequence was that the force of infection accumulated through time, eventually resulting in the arithmetic errors you were seeing.

The codes above fix this, and also show off a couple of nifty C features:

kingaa commented 8 years ago

To go back to an earlier question: how does one fill a userdata slot? Consider the following amendments to the codes just above.

sir_step <- Csnippet("
  double *X = &X1;
  int i,j;
  int n = agegroups; 
  const double *Cvlocal = get_pomp_userdata_double(\"contact_matrix\");
  double *Slocal = &X1;
  double *Ilocal = Slocal+n;
  double *Rlocal = Ilocal+n;
#define CM(J,K) Cvlocal[(J)+n*(K)]
#define S(K) Slocal[(K)]
#define I(K) Ilocal[(K)]
#define R(K) Rlocal[(K)]
  double dN_SI[n];
  double dN_IR[n];
  double lambda;

  // epidemic process 
  for (i = 0; i < n; i++) {
    lambda = 0.0;
    for (j = 0; j < n; j++)
      lambda += CM(i,j)*I(j);
    dN_SI[i] = rbinom(S(i),1-exp(-Beta*lambda/N*dt));
    dN_IR[i] = rbinom(I(i),1-exp(-gamma*dt));
    S(i) -=   dN_SI[i];
    I(i) +=   dN_SI[i] - dN_IR[i];
    R(i) +=   dN_IR[i];
    X[3*n] += dN_IR[i]; 
   }
")

pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N","agegroups"),
     contact_matrix=C
     ) -> sir 

Note that contact_matrix is passed as a named argument to pomp. This results in the message: In ‘pomp’: the following unrecognized argument(s) will be stored for use by user-defined functions: ‘contact_matrix’ as the matrix is added to the userdata slot. Within the sir_step Csnippet, the call to get_pomp_userdata_double returns a double* pointer to the first element of the matrix.

christelpei commented 8 years ago

Thanks for your detailled feedback! I have indeed been sloppy in assuming lambda[i] might as a local variable by default be reset to zero after each time step. Still, I find it a bit surprising that lambda[i] accumulated if I include the printf or directly assign int n = 5. Otherwise, it becomes NaN.

Also thanks for the code that is now working fine and consistently and for explaining userdata slot in more detail!

Regarding parameter vectors: Is it only possible to have arrays of doubles as parameters?

If I would like to add a vector of integers in the above code, i.e.

sir_step <- Csnippet("
  double *X = &X1;
  int i,j;
  const int *popage = &age1;

 ....

pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N",sprintf("age%d",1:5),"agegroups",sprintf("Cv%d",1:25))
) -> sir 

it says

warning: initialization from incompatible pointer type [enabled by default] const int *popage = &age1;

which is fixed as soon as replace it by

sir_step <- Csnippet("
  double *X = &X1;
  int i,j;
  const double *popage = &age1;

 ....

pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N",sprintf("age%d",1:5),"agegroups",sprintf("Cv%d",1:25))
) -> sir 
kingaa commented 8 years ago

In pomp, the parameters are always double-precision floating-point numbers. If you want to pass integer-valued parameters, you need to recast them in your C snippets. You can also pass integers using the userdata slot, by simply giving pomp those arguments as integers, then accessing them via get_pomp_userdata_int. The prototype is:

const int *get_pomp_userdata_int (const char *name);

as defined in pomp.h.

Here's an example of the second approach for your code:

sir_step <- Csnippet("
  double *X = &X1;
  int i,j;
  const int *age = get_pomp_userdata_int(\"age\");
 ...
")

pomp(bsflu,times="day",t0=0,
     rprocess=euler.sim(sir_step,delta.t=1/5),
     initializer=sir_init,rmeasure=rmeas,dmeasure=dmeas,
     zeronames="X16",statenames=statX,
     paramnames=c("Beta","gamma","rho","N","agegroups",sprintf("Cv%d",1:25),
     age=as.integer(1:10)
) -> sir 

Note that I use as.integer to ensure that these numbers are internally represented as integers. This in turn ensures that get_pomp_userdata_int will not fail.

gokhale616 commented 6 years ago

@kingaa: I am trying to solve a very similar model with age-structure. But in my case, all I wish to do is generate deterministic trajectories

I think I have followed the instructions in defining the snippets as described in the query string above- mainly making sure of the contiguity of my state variables and macros definitions in the Csnippets.

However, 2 compilation errors are generated mainly in the definition of the system of differential equations within the loop over my age classes Eg. DS(i) = nu(1 - p)N + deltaV(i) - lambda1S(i) - lambda2S(i) - lcS(i) - mu*S(i); - (1) implicit declaration of function 'DS' is invalid, (2) expression is not assignable: Here S(i) is a macro defined in a similar fashion as described in above.

Does one define the derivative with respect to time in a different way when macros and loops over arrays of state variables are involved?

The following codes should reproduce these errors:

#loading packages####
packages = c("tidyverse", "pomp", "reshape2", "plyr", "ggthemes",
             "grid", "gridExtra", "fitR")

sapply(packages, library, character.only = TRUE)
rm(packages)

#Loading contact matrix and probabilities of infection
C <- matrix(c(100, 70, 30, 
               70, 90, 50,
               30, 50, 80), nrow = 3) 

q <- c(0.003, 0.003, 0.004)

#pomp model specifications####
#deterministic skeleton with age groups
#NOTE:
#Remember to set the lambda's to zero after calculation for every age group 
skel <-"
  const double *Cvlocal = &Cv1;
  const double *qlocal = &qv1;

  int i, j;
  int n = agegroups;

  double *Slocal = &S_1;
  double *Vlocal = Slocal + n;
  double *I1local = Vlocal + n;
  double *I2local = I1local + n;
  double *Rlocal = I2local + n;
  double *Inc1local = Rlocal + n;
  double *Inc2local = Inc1local + n;

#define CM(J,K) Cvlocal[(J)+n*(K)]
#define Q(K) qlocal[(K)]
#define S(K) Slocal[(K)]
#define V(K) Vlocal[(K)]
#define I1(K) I1local[(K)]
#define I2(K) I2local[(K)]
#define R(K) Rlocal[(K)]
#define INC1(K) Inc1local[(K)]
#define INC2(K) Inc2local[(K)]

  /*basic parameters*/
  double lambda1;
  double lambda2;
  double gamma = 1/D_inf;
  double delta = 1/D_wan;

  /*Defining an age structure with contact matrix and probability of infection*/

  /*Defining the system of differential equations within a double for loop*/  
  for(i = 0; i < n; i++) {
     lambda1 = 0.0;
     lambda2 = 0.0;
        if(i == 0) {
          for(j  = 0; j < n; j++) {
            /*calcuating the age specific force of infection*/ 
            lambda1 = CM(i, j)*Q(j)*I1(i)/N;
            lambda2 = CM(i, j)*Q(j)*I2(i)/N;

            DS(i) = nu*(1 - p)*N + delta*V(i) - lambda1*S(i) - lambda2*S(i) - lc*S(i) - mu*S(i);
            DV(i) = nu*p*N - delta*V(i) - epsilon1*lambda1*V(i) - epsilon2*lambda2*V(i) - lc*V(i) - mu*V(i);
            DI1(i) = lambda1*S(i) + epsilon1*lambda1*V(i) - gamma*I1(i) - lc*I1(i) - mu*I1(i);
            DI2(i) = lambda2*S(i) + epsilon2*lambda2*V(i) - gamma*I2(i) - lc*I2(i) - mu*I2(i);
            DR(i) = (I1(i) + I2(i))*gamma - lc*R(i) - mu*R(i);

            /*True cases*/
            DInc1(i) = I1(i)*gamma;
            DInc2(i) = I2(i)*gamma;
          }
        } else {
            for(j  = 0; j < n; j++) {
              /*calcuating the age specific force of infection*/ 
              lambda1 = CM(i, j)*Q(j)*I1(i)/N;
              lambda2 = CM(i, j)*Q(j)*I2(i)/N;

              DS(i) = lc*S(i-1) + delta*V(i) - lambda1*S(i) - lambda2*S(i) - mu*S(i);
              DV(i) = lc*V(i-1) - delta*V(i) - epsilon1*lambda1*V(i) - epsilon2*lambda2*V(i) - mu*V(i);
              DI1(i) = lc*I1(i-1) + lambda1*S(i) + epsilon1*lambda1*V(i) - gamma*I1(i) - mu*I1(i);
              DI2(i) = lc*I2(i-1) + lambda2*S(i) + epsilon2*lambda2*V(i) - gamma*I2(i) - mu*I2(i);
              DR(i) = lc*R(i-1) + (I1(i) + I2(i))*gamma - mu*R(i);

              /*True cases*/
              DInc1(i) = I1(i)*gamma;
              DInc2(i) = I2(i)*gamma;
        }
     }
  }
"

#initilializer with age groups
init <- "
  double *X = &S_1;
  int i; 
  int n = agegroups;
  for(i = 0; i < n; i++) {
    X[i] = 2000;
    X[i + n] = 2000;
    X[i + 2*n] = 100;
    X[i + 3*n] = 100;
    X[i + 4*n] = 5800;
    X[i + 5*n] = 0;
    X[i + 6*n] = 0;
  }
" 

#set true incidence states to zero  at each time step
make_zero = c(c(sprintf("Inc1_%d", 1:3)), c(sprintf("Inc2_%d", 1:3)))

#definng statenames with age groups 
state_names <- c(c(sprintf("S_%d", 1:3)), c(sprintf("V_%d", 1:3)), c(sprintf("I1_%d", 1:3)), 
                c(sprintf("I2_%d", 1:3)), c(sprintf("R_%d", 1:3)),
                c(sprintf("Inc1_%d", 1:3)), c(sprintf("Inc2_%d", 1:3)))    

#parameter names
rp_names <- c("N", "p", "nu", "mu", "agegroups", "epsilon1", "epsilon2", "D_inf", "D_wan", "lc",
              c(sprintf("Cv%d", 1:9)), c(sprintf("qv%d", 1:3)))

ip_names <- c(c(sprintf("S_%d_0", 1:3)), c(sprintf("V_%d_0", 1:3)), c(sprintf("I1_%d_0", 1:3)), 
              c(sprintf("I2_%d_0", 1:3)), c(sprintf("R_%d_0", 1:3)),
              c(sprintf("Inc1_%d_0", 1:3)), c(sprintf("Inc2_%d_0", 1:3)))   

parameter_names <- c(rp_names, ip_names)

#NOTE:
#No need to use as.vector (i.e. no need to convert matrix into a vector):  
#a matrix in R is a vector (with a "dim" attribute that is ignored here). 
#'Remember' that matrices are always stored in column-major order.

#parameter values
rp_values <- c(N = 30000, p = 0.5, nu = 1/50, mu = 1/50, agegroups = 3, lc = 1.5,
               epsilon1 = 1, epsilon2 = 1, D_inf = 5/365, D_wan = 20, qv = q, Cv = C)

ip_values <- c(rep(2000, 6), rep(100, 6), rep(5800, 3), rep(0, 6))
names(ip_values) <- ip_names

parameter_values <- c(rp_values, ip_values)

#pomp object:

data.frame(Year = seq(1, 2, by = 1/52), 
           Inc1 = NA,
           Inc2 = NA) %>% 
  pomp(t0 = 1,
       times = "Year",
       initializer = Csnippet(init),
       skeleton = vectorfield(Csnippet(skel)),
       statenames = state_names,
       paramnames = rp_names,
       zeronames = make_zero,
       params = rp_values) -> po

init.state(po)

Further, is there a way of suppressing one of state variable to 0 for a certain amount of integration time and then introduce some individuals and watch the population within that compartment change from there on, within the same model? or will I have to transfer the steady state population of one model as my initial conditions to another model with an extra compartment? Eg. I would like the compartment I2 to remain at zero for the first 10 years and then have an introduction of 100 cases within the system in I2.

kingaa commented 6 years ago

@gokhale616:

The issue is that you have not defined the D* variables. Examine the codes below, which resolve the issue. The chief difference is that I define the D variables. Note also the use of const.

To understand what is happening, it is useful to examine the source code generated by pomp from your C snippets. This is accomplished via the call to spy in the last line.

To answer your other questions: Nonautonomous, i.e, explicitly time-dependent, processes are no problem for pomp. You can always make the codes depend on time, which appears as variable t in the context of most C snippets. [The exceptions are dprior and rprior and the parameter transformation snippets: these elements are not allowed to be time dependent.]

#loading packages####
packages = c("tidyverse", "pomp", "reshape2", "plyr", "ggthemes",
  "grid", "gridExtra")

sapply(packages, library, character.only = TRUE)
rm(packages)

#Loading contact matrix and probabilities of infection
C <- matrix(c(100, 70, 30,
  70, 90, 50,
  30, 50, 80), nrow = 3)

q <- c(0.003, 0.003, 0.004)

#pomp model specifications####
#deterministic skeleton with age groups
#NOTE:
#Remember to set the lambda's to zero after calculation for every age group
skel <-"
const double *Cvlocal = &Cv1;
const double *qlocal = &qv1;

int i, j;
int n = agegroups;

const double *Slocal = &S_1;
const double *Vlocal = &V_1;
const double *I1local = &I1_1;
const double *I2local = &I2_1;
const double *Rlocal = &R_1;
const double *Inc1local = &Inc1_1;
const double *Inc2local = &Inc2_1;

double *DSlocal = &DS_1;
double *DVlocal = &DV_1;
double *DI1local = &DI1_1;
double *DI2local = &DI2_1;
double *DRlocal = &DR_1;
double *DInc1local = &DInc1_1;
double *DInc2local = &DInc2_1;

#define CM(J,K) Cvlocal[(J)+n*(K)]
#define Q(K) qlocal[(K)]
#define S(K) Slocal[(K)]
#define V(K) Vlocal[(K)]
#define I1(K) I1local[(K)]
#define I2(K) I2local[(K)]
#define R(K) Rlocal[(K)]
#define INC1(K) Inc1local[(K)]
#define INC2(K) Inc2local[(K)]

#define DS(K) DSlocal[(K)]
#define DV(K) DVlocal[(K)]
#define DI1(K) DI1local[(K)]
#define DI2(K) DI2local[(K)]
#define DR(K) DRlocal[(K)]
#define DINC1(K) DInc1local[(K)]
#define DINC2(K) DInc2local[(K)]

/*basic parameters*/
double lambda1;
double lambda2;
double gamma = 1/D_inf;
double delta = 1/D_wan;

/*Defining an age structure with contact matrix and probability of infection*/

/*Defining the system of differential equations within a double for loop*/
for (i = 0; i < n; i++) {
  lambda1 = 0.0;
  lambda2 = 0.0;
  if (i == 0) {
    for (j  = 0; j < n; j++) {
      /*calcuating the age specific force of infection*/
      lambda1 = CM(i, j)*Q(j)*I1(i)/N;
      lambda2 = CM(i, j)*Q(j)*I2(i)/N;

      DS(i) = nu*(1 - p)*N + delta*V(i) - lambda1*S(i) - lambda2*S(i) - lc*S(i) - mu*S(i);
      DV(i) = nu*p*N - delta*V(i) - epsilon1*lambda1*V(i) - epsilon2*lambda2*V(i) - lc*V(i) - mu*V(i);
      DI1(i) = lambda1*S(i) + epsilon1*lambda1*V(i) - gamma*I1(i) - lc*I1(i) - mu*I1(i);
      DI2(i) = lambda2*S(i) + epsilon2*lambda2*V(i) - gamma*I2(i) - lc*I2(i) - mu*I2(i);
      DR(i) = (I1(i) + I2(i))*gamma - lc*R(i) - mu*R(i);

      /*True cases*/
      DINC1(i) = I1(i)*gamma;
      DINC2(i) = I2(i)*gamma;
    }
  } else {
    for(j  = 0; j < n; j++) {
      /*calcuating the age specific force of infection*/
      lambda1 = CM(i, j)*Q(j)*I1(i)/N;
      lambda2 = CM(i, j)*Q(j)*I2(i)/N;

      DS(i) = lc*S(i-1) + delta*V(i) - lambda1*S(i) - lambda2*S(i) - mu*S(i);
      DV(i) = lc*V(i-1) - delta*V(i) - epsilon1*lambda1*V(i) - epsilon2*lambda2*V(i) - mu*V(i);
      DI1(i) = lc*I1(i-1) + lambda1*S(i) + epsilon1*lambda1*V(i) - gamma*I1(i) - mu*I1(i);
      DI2(i) = lc*I2(i-1) + lambda2*S(i) + epsilon2*lambda2*V(i) - gamma*I2(i) - mu*I2(i);
      DR(i) = lc*R(i-1) + (I1(i) + I2(i))*gamma - mu*R(i);

      /*True cases*/
      DINC1(i) = I1(i)*gamma;
      DINC2(i) = I2(i)*gamma;
    }
  }
}
"

#initilializer with age groups
init <- "
double *X = &S_1;
int i;
int n = agegroups;
for(i = 0; i < n; i++) {
X[i] = 2000;
X[i + n] = 2000;
X[i + 2*n] = 100;
X[i + 3*n] = 100;
X[i + 4*n] = 5800;
X[i + 5*n] = 0;
X[i + 6*n] = 0;
}
"

#set true incidence states to zero  at each time step
make_zero = c(c(sprintf("Inc1_%d", 1:3)), c(sprintf("Inc2_%d", 1:3)))

#definng statenames with age groups
state_names <- c(c(sprintf("S_%d", 1:3)), c(sprintf("V_%d", 1:3)), c(sprintf("I1_%d", 1:3)),
  c(sprintf("I2_%d", 1:3)), c(sprintf("R_%d", 1:3)),
  c(sprintf("Inc1_%d", 1:3)), c(sprintf("Inc2_%d", 1:3)))

#parameter names
rp_names <- c("N", "p", "nu", "mu", "agegroups", "epsilon1", "epsilon2", "D_inf", "D_wan", "lc",
  c(sprintf("Cv%d", 1:9)), c(sprintf("qv%d", 1:3)))

ip_names <- c(c(sprintf("S_%d_0", 1:3)), c(sprintf("V_%d_0", 1:3)), c(sprintf("I1_%d_0", 1:3)),
  c(sprintf("I2_%d_0", 1:3)), c(sprintf("R_%d_0", 1:3)),
  c(sprintf("Inc1_%d_0", 1:3)), c(sprintf("Inc2_%d_0", 1:3)))

parameter_names <- c(rp_names, ip_names)

#NOTE:
#No need to use as.vector (i.e. no need to convert matrix into a vector):
#a matrix in R is a vector (with a "dim" attribute that is ignored here).
#'Remember' that matrices are always stored in column-major order.

#parameter values
rp_values <- c(N = 30000, p = 0.5, nu = 1/50, mu = 1/50, agegroups = 3, lc = 1.5,
  epsilon1 = 1, epsilon2 = 1, D_inf = 5/365, D_wan = 20, qv = q, Cv = C)

ip_values <- c(rep(2000, 6), rep(100, 6), rep(5800, 3), rep(0, 6))
names(ip_values) <- ip_names

parameter_values <- c(rp_values, ip_values)

#pomp object:

data.frame(Year = seq(1, 2, by = 1/52),
  Inc1 = NA,
  Inc2 = NA) %>%
  pomp(t0 = 1,
    times = "Year",
    initializer = Csnippet(init),
    skeleton = vectorfield(Csnippet(skel)),
    statenames = state_names,
    paramnames = rp_names,
    zeronames = make_zero,
    params = rp_values) -> po

init.state(po)
spy(po)
gokhale616 commented 6 years ago

@kingaa Thank you very much for such a quick response. The use of the spy function clarifies a lot of things. Especially with the assignment and storage of variables within pomp.

With respect to the use of 'const' in the definition, the code does not use the qualifier during parameter declaration Eg. 'delta': is this because of its definition within the 'paramnames' argument of pomp, since the compiler does not prompt a warning for that variable?

kingaa commented 6 years ago

Read up on what the const declaration does in C. In pomp, it prevents you from violating the mathematical definitions and assumptions of the POMP methods, which insist, for example, that parameters can't change in time and that rmeasure can't change the state variables, etc.

In the case of your delta, you are actually copying the value of this parameter into a new memory location. Thus, even if you were to change the value of delta (for some strange reason), you would not affect the value of the parameter for computations outside of the scope of your C snippet. This is why the compiler throws no warning.

If, however, you were to try to do something like D_wan = 3;, you would get a compiler error.

I'm going to close this issue now. Feel free to re-open if necessary.

bogaotory commented 4 years ago

In regards to the solution given in FAQ 3.1, I wonder what the recommended solution / workaround is if the rprocess function is defined in R rather than in a C snippet?

Say we want to keep track of two state variables, A and B. A is a single numeric value, so it is fine. But B is a vector of length 10.

If I were to "copy & paste" the solution from FAQ 3.1, I would just jam A and B together as c(A, B), and let B become B1, ..., B10. Then reconstruct B at the beginning of rprocess to minimise changes made to existing code. Is this the correct/best approach?


On a slightly different note. What if although I need the simulation to track B, but fitting is only done on A. (For when you have a model that's more detailed than the data available on hand.)

In such cases, is there/should there be a mechanism in pomp to allow some none-scalar state variables such as B to "sneak through" components such as rinit. By "sneak through" I mean allow the use of list() instead of 'c()' as return types. This is currently prohibited such as here. Don't know if this is feasible and would it cause a big impact on the package?

kingaa commented 4 years ago

The basic principle is that pomp expects every real-valued parameter to have a name. So you would name your B parameters B1 through B10. You would then have to manually reconstruct B. I think this is what you mean. It is not very elegant.

I suppose you could also rely on the fact that all the state variables are passed to each of the basic model components, whether they are needed or not, via the ... argument in the latter case. Thus you could for example do something like

rmeasure = function (A, ...) {
B <- c(...)
<etc.>
}

and thus avoid having to manually do B <- c(B1,B2,...B10). I'm not sure if this is more or less elegant. If it is, it's not by much.

The main role of the R function interface is pedagogical. It provides a stepping stone for those with no C programming experience. Not only do C snippets lead to faster code, they are more flexible. I would definitely advise not putting a lot of effort into the R function interface: just bite the bullet and learn to use the C snippets.

Now, on to your second question. The scenario you describe is the rule, not the exception. This is why partially observed Markov process are partially observed. In fact, it is almost invariably the case in applications that the latent state space has higher order than the number of state variables.

Certainly, one could imagine making the interface to the low-level computations more flexible: to accommodate lists, integer-type variables, arbitrary user-defined types, etc. However, the overriding concern is speed. Simulation-based inference is so expensive (many simulations are typically required) that there is a premium on doing it all quickly. Hence the restrictions on the variables that are passed to and fro. Speed is not the only consideration, however: it seems fairly clear that the code complexity would rise dramatically were pomp to need to handle arbitrary data types. Indeed, is this not much of what makes interpreted R code slow?

bogaotory commented 4 years ago

Thanks very much for the reply and insight, (and also for authoring this package in the first place and the many documentations / tutorials that come with it.)

You touched on many good points such as code elegance, efficiency, safety in your comment, and indeed the issue on hand as it seems is not about whether a solution exist, but about what kind of solution one wish to have. Code elegance is quite often where the compromise is found in the end.

Just to throw another one in the hat, the solution I end up using for now is to have a matrix in the global environment each row of which hosts a simulation instance's vector of "hidden states". There might be problems down the line when doParallel is involved, but that's another bridge to burn once I get there.

On your answer to my second question, my intuition is that the partial/hidden-ness of these models is reflected by the user's implementation of the rmeasure function. For instance, you may have two state variables A1 and A2 in rprocess, but only A1+A2 is observable, therefore your rmeasure function only return one observed value (based on A1+A2) instead of two observed values. Is my understanding correct or does pomp strictly require that rmeasure return the exact same number of variables as rprocess does?

kingaa commented 4 years ago

Just to throw another one in the hat, the solution I end up using for now is to have a matrix in the global environment each row of which hosts a simulation instance's vector of "hidden states". There might be problems down the line when doParallel is involved, but that's another bridge to burn once I get there.

Yes, I think this is a sensible solution. I suppose one could even take it to extremes: using the pomp state vector (which as you know is constrained to be a named double-precision vector) merely to hold pointers to arbitrary memory defined elsewhere in the code. The chief inelegance is using a double-precision number to hold a pointer, but one can get over this, I suppose.

On your answer to my second question, my intuition is that the partial/hidden-ness of these models is reflected by the user's implementation of the rmeasure function. For instance, you may have two state variables A1 and A2 in rprocess, but only A1+A2 is observable, therefore your rmeasure function only return one observed value (based on A1+A2) instead of two observed values. Is my understanding correct or does pomp strictly require that rmeasure return the exact same number of variables as rprocess does?

Your understanding is absolutely correct. The rmeasure and dmeasure basic model components provide a link between the latent state process and the observable process; that link can be arbitrary so long as it obeys the conditional independence assumptions of the POMP class of models (viz., _p(Y_t | {X(s)}) = p(Yt | x(t))). Indeed, the functionality would be very greatly reduced were we to have to make the assumption of some kind of 1-1 correspondence between state variables and observables! One can almost never observe everything one wants to!

So important is this issue, that I'm actually worried now that the documentation may be giving a false impression. Can you do me (and all pomp users) a favor: think hard and tell me if you can, how did you formulate this notion that there might be such a restriction? If you got it from the documentation anywhere, I very much need to correct that problem so as to avoid misunderstandings!

bogaotory commented 4 years ago

(I wouldn't worry about it too much because like I said my instinct is that the reduction of the state space take place in the rmeasure function. And as a user, I was always going to try to break the package before accepting any assumption that does not work in my favour as a fact.) But here goes what I think would probably help:

The first documentation I read about pomp is Getting started with pomp. Like I said I didn't get any idea of that restriction at all, but now that I'm reading it again, I think maybe in the second illustration, you can make the boxes containing Y-s slightly shorter in height than those containing X-es, or make the boxes around X-es taller. So as to indicate a difference in the size of state space after the measurement model is applied?

Next thing I read was the example scripts attached to Simulation-based Inference for Epidemiological Dynamics, but it is quite obvious in the examples you provided that the two processes don't share the same return signature. Now I feel a bit silly to have implied such one-to-one relation existed.

kingaa commented 4 years ago

I think maybe in the second illustration, you can make the boxes containing Y-s slightly shorter in height than those containing X-es, or make the boxes around X-es taller. So as to indicate a difference in the size of state space after the measurement model is applied?

Good idea!

Now I feel a bit silly to have implied such one-to-one relation existed.

Not at all. Generically, I find pessimism to be a good default setting. Then, when you are surprised, it tends to be pleasantly so.

Thanks for your input @bogaotory!

bogaotory commented 4 years ago

No problem. Glad to have helped!