kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
110 stars 26 forks source link

Can I fit surveillance data using zero-inflated Poisson or negative binomial distributions? #137

Closed Spatial-R closed 3 years ago

Spatial-R commented 3 years ago

Recently, i try to use the pomp package to fit the surveillance data. The problem is that there is nearly 50% points (77/150) with zero value due to the spare sampling intensity. Therefore, i guess that the zero inflated Poisson or negative binomial distribution is more suitable for my data to calculate the likelihood. I have searched the Rcpp document, however, there is no function to realize this. I have tested the overdispersed binomial distribution (codes in the measles example) on my data, however, the model have fitted the zero points very well but failed to fit the non-zero value. Could you give some suggestions for this?

kingaa commented 3 years ago

There may not be a C interface to the specific distribution you want. You can either (1) find a package with a C interface that implements this distribution or (2) implement the distribution functions you need in C yourself. In the case of zero-inflated distributions, the second choice is probably the easiest. A zero-inflated distribution is just a mixture distribution. Thus, for example, suppose y is supposed to follow a zero-inflated Poisson distribution (1-p) Poisson(mu) + p delta_0, where the last term denotes a point mass at 0. A corresponding dmeasure C snippet would include something like

if (y < 0.5) {
  lik = p + (1-p)*dpois(0,mu);
} else {
  lik = (1-p)*dpois(y,mu);
}

and the rmeasure component would look something like

if (runif(0,1) < p) {
  y = 0;
} else {
  y = rpois(mu);
}
Spatial-R commented 3 years ago

Thanks for your kindly help. I followed your guidance and writed the dmeas and rmeas part using the following codes:

dmeas <- Csnippet("
                  double lik_A, lik_B;
                  double mA = rho_A*K_A;

                  if (total_a < 0.5) {
                    lik_A = (1-phi_A)*dpois(0, mA,1) + phi_A;
                  } else {
                    lik_A = (1-phi_A)*dpois(total_a, mA,1);                    
                  }

                  double mB = rho_B*K_B;

                  if (total_b < 0.5) {
                    lik_B = (1-phi_B)*dpois(0,mB,1) + phi_B;
                  } else {
                    lik_B = (1-phi_B)*dpois(total_b,mB,1);
                  }
                  lik = lik_A + lik_B;
                  ")

rmeas <- Csnippet("
                  double mA = rho_A*K_A;

                  if (runif(0,1) < phi_A) {
                    total_a = 0.0;
                  } else {
                    total_a = rpois(mA);
                  }

                  double mB = rho_B*K_B;

                  if (runif(0,1) < phi_B) {
                    total_b = 0.0;
                  } else {
                    total_b = rpois(mB);
                  }
                  ")

I assumed that both the total_a and total_b are supposed to follow a zero-inflated Poisson distribution. However, when running the model, there is an error: Error: in ‘mif2’: ‘dmeasure’ returns illegal value. Since there is no error when using the overdispersed binomial distribution to calculate the likelihood, I guess there is some wrong in the measure part here. Could you give me some help on this? My codes for the model are here, and the dataset is here:

library(pomp)
library(dplyr)

load("Zero_point.RData")

rproc <- Csnippet("

                  double betaA, betaB, foiA, foiB, seasA, seasB; //dw foiB
                  double rate[13], trans[13];

                  // white noise (extra-demographic stochasticity, 
                  // currently asssumed the same for both types(?))
                  //dw = rgammawn(sigmaSE,dt);

                  // sinusoidal seasonality (same for both serotypes currently!)
                  seasA = (RT_A*h3_env+2*R0_A);
                  seasB = (RT_B*h1_env+2*R0_B);

                  betaA = seasA*gamma_A;
                  betaB = seasB*gamma_B; //currently sharing seasonality

                  if(S < 1){S = 1;}
                  if(I_A < 1){I_A = 1;}
                  if(I_B < 1){I_B = 1;}
                  if(R_A < 1){R_A = 1;}
                  if(R_B < 1){R_B = 1;}
                  if(I_BA < 1){I_BA = 1;}
                  if(I_AB < 1){I_AB = 1;}
                  if(R_AB < 1){R_AB = 1;}
                  if(BH < 1){BH = 1;}

                  double qmcA,qmcB;

                  if(t > (2014+0.6*tm1_A) && t < (2015+0.6*tm2_A)){
                  qmcA = 1/(qm1_A*0.95+0.05)*exp(-1/dk1_A*(t - (2014+0.6*tm1_A)));
                  } else if (t > (2015+0.6*tm2_A)) {
                  qmcA = 1/(qm2_A*0.95+0.05)*exp(-1/dk2_A*(t - (2015+0.6*tm2_A)));
                  } else {
                  qmcA = 1;
                  }

                  if(t > (2013.5+0.6*tm1_B) && t < (2014.5)){
                  qmcB = 1/(qm1_B*0.95+0.05)*exp(-1/dk1_B*(t - (2013.5+0.6*tm1_B)));
                  } else if (t > (2015.5+0.5*tm2_B) && (t < 2016.0)){
                  qmcB = 1/(qm2_B*0.95+0.05)*exp(-1/dk2_B*(t - (2015.5+0.6*tm2_B)));
                  } else {
                  qmcB = 1;
                  }

                  // forces of infection
                  // N = S + I_A + I_B + R_A + R_B + I_AB + I_BA + R_AB + RBH;
                  foiA = qmcA*betaA*(I_A + I_AB + 100*eta_A)/pop; 
                  foiB = qmcB*betaB*(I_B + I_BA + 100*eta_B)/pop;

                  double SBH = nearbyint((BHI)/(rhoBH)*dt); // AH1(t): the case with H1N1 infection

                  // Out of S
                  rate[0] = foiA; // *dw/dt;  //infection of S with A
                  rate[1] = foiB; // *dw/dt;  //infection of S with B
                  reulermultinom(2,S,&rate[0],dt,&trans[0]);

                  // Out of I_A
                  rate[2] = gamma_A;  //Recovery of I_A
                  reulermultinom(1,I_A,&rate[2],dt,&trans[2]);

                  // Out of I_B
                  rate[3] = gamma_B;  //Recovery of I_B
                  reulermultinom(1,I_B,&rate[3],dt,&trans[3]);

                  // Out of R_A
                  rate[4] = (1-chi_BA)*foiB; //*dw/dt;  //infection of R_A with B
                  rate[5] = w_A;  //Loss of immunity of R_A 
                  reulermultinom(2,R_A,&rate[4],dt,&trans[4]);

                  // Out of R_B
                  rate[6] = (1-chi_AB)*foiA; //*dw/dt;  //infection of R_B with A
                  rate[7] = w_B;  //Loss of immunity of R_B
                  reulermultinom(2,R_B,&rate[6],dt,&trans[6]);

                  // Out of I_BA
                  rate[8] = gamma_B;  //Recovery of I_BA (recovery from B infection)
                  reulermultinom(1,I_BA,&rate[8],dt,&trans[8]);

                  // Out of I_AB
                  rate[9] = gamma_A;  //Recovery of I_AB (recovery from A infection)
                  reulermultinom(1,I_AB, &rate[9],dt,&trans[9]);

                  // Out of R_AB
                  rate[10] = w_A;  //Loss of immunity of R_AB to A
                  rate[11] = w_B;  //Loss of immunity of R_AB to B
                  reulermultinom(2,R_AB,&rate[10],dt,&trans[10]);

                  // Out of BH
                  rate[12] = sigmaB;  //Loss of immunity of R_AB to A
                  reulermultinom(1,BH,&rate[12],dt,&trans[12]);                                  

                  S += trans[5] + trans[7] - trans[0] - trans[1] - SBH + trans[12];
                  I_A += trans[0] - trans[2];
                  I_B += trans[1] - trans[3];
                  R_A += trans[2] + trans[11] - trans[4] - trans[5];
                  R_B += trans[3] + trans[10] - trans[6] - trans[7];
                  I_BA += trans[4] - trans[8];
                  I_AB += trans[6] - trans[9];
                  R_AB += trans[8] + trans[9] - trans[10] - trans[11];
                  BH += SBH - trans[12];

                  K_A += trans[2] + trans[9];           // true incidence of A
                  K_B += trans[3] + trans[8];           // true incidence of B

                  ")

rinit <- Csnippet("
                  // how best to initialise population to ensure population size is correct
                  // what is this baryometric thing
                  double m = pop/(S_0+I_A_0+I_B_0+R_A_0+R_B_0+I_AB_0+I_BA_0+R_AB_0+BH_0);
                  S = nearbyint(m*S_0);
                  I_A = nearbyint(m*I_A_0);
                  I_B = nearbyint(m*I_B_0);
                  R_A = nearbyint(m*R_A_0);
                  R_B = nearbyint(m*R_B_0);
                  I_AB = nearbyint(m*I_AB_0);
                  I_BA = nearbyint(m*I_BA_0);
                  R_AB = nearbyint(m*R_AB_0);
                  BH = nearbyint(m*BH_0);

                  W = 0;
                  K_A = 0;
                  K_B = 0;

                  ")

model_params = c("R0_A","R0_B","RT_A","RT_B", "gamma_A", "gamma_B",
                 "w_A", "w_B","eta_A", "eta_B", "phi_A", "phi_B",
                 "chi_BA", "chi_AB", "rho_A", "rho_B", "sigmaB", "rhoBH",
                 "qm1_A","qm2_A","tm1_A","tm2_A","dk1_A","dk2_A",
                 "qm1_B","qm2_B","tm1_B","tm2_B","dk1_B","dk2_B")

model_variables = c("S", "I_A", "I_B", "R_A", "R_B", "I_BA", "I_AB", "R_AB","BH")

model_ic_params = paste(model_variables, "_0", sep="")

paratransform <- parameter_trans(log=c(paste0("w_",c("A","B")),"gamma_A","gamma_B","sigmaB","dk1_A","dk2_A","dk1_B","dk2_B"),
                                 logit=c(paste0("R0_",c("A","B")),paste0("RT_",c("A","B")),
                                         paste0("phi_",c("A","B")),paste0("rho_",c("A","B")),
                                         "qm1_A","qm2_A","tm1_A","tm2_A","rhoBH",
                                         "qm1_B","qm2_B","tm1_B","tm2_B",
                                         paste0("chi_",c("AB","BA")),paste0("eta_",c("A","B")),
                                         model_ic_params))

parameter_seir <- c(model_params, model_ic_params)

t0 <- inc$week[1]-1/12

inc %>% 
  arrange(week) %>%
  pomp(
    times ='week',
    t0=t0,
    rinit = rinit,
    rprocess=euler(step.fun = rproc,delta.t=2/365),
    rmeasure=rmeas,
    dmeasure=dmeas,
    partrans = paratransform,
    covar=covariate_table(cov, times = 'week'),
    accumvars =c ("K_A", "K_B", "W"),
    statenames = c(model_variables, c("K_A", "K_B", "W")),
    paramnames=parameter_seir
  ) -> origin.model

sobolDesign(
  lower = c(R0_A = 0.8,R0_B =0.8,RT_A =0.2,RT_B = 0.2,gamma_A = 365/(3),gamma_B = 365/(3),
            w_A =365/(10*365),w_B =365/(10*365),eta_A =0.2,eta_B = 0.2,phi_A = 0.8,phi_B =0.8,chi_BA =0.2,chi_AB = 0.2,
            rho_A = 0.8,rho_B =0.8,sigmaB = 365/(10*365),rhoBH = 0.8,
            qm1_A = 0.8,qm2_A =0.8,tm1_A = 0.8,tm2_A =0.8,dk1_A = 0.8,dk2_A =0.8,
            qm1_B = 0.8,qm2_B =0.8,tm1_B = 0.8,tm2_B =0.8,dk1_B = 0.8,dk2_B =0.8,
            S_0 = 0.2,I_A_0 = 0.2e-5,I_B_0 = 0.2e-5,R_A_0 = 0.2,R_B_0 = 0.2,C_A_0 = 2e-6,C_B_0 = 2e-6,
            I_BA_0 = 2e-6,I_AB_0 = 2e-6,R_AB_0 = 2e-6,BH_0 = 0.8),
  upper = c(R0_A = 0.8,R0_B =0.8,RT_A =0.8,RT_B = 0.8,gamma_A = 365/(3),gamma_B = 365/(3),
            w_A =365/(0.5*365),w_B =365/(0.5*365),eta_A =0.8,eta_B = 0.8,phi_A = 0.8,phi_B =0.8,chi_BA =0.8,chi_AB = 0.8,
            rho_A = 0.8,rho_B =0.8,sigmaB = 365/(0.8*365),rhoBH = 0.8,
            qm1_A = 0.8,qm2_A =0.8,tm1_A = 0.8,tm2_A =0.8,dk1_A = 0.8,dk2_A =0.8,
            qm1_B = 0.8,qm2_B =0.8,tm1_B = 0.8,tm2_B =0.8,dk1_B = 0.8,dk2_B =0.8,
            S_0 = 0.8,I_A_0 = 2e-3,I_B_0 = 2e-3,R_A_0 = 0.2,R_B_0 = 0.8,C_A_0 = 2e-3,C_B_0 = 2e-3,
            I_BA_0 = 2e-3,I_AB_0 = 2e-3,R_AB_0 = 2e-3,BH_0 = 0.4),
  nseq = 3000
) -> guess

rw.sd_rp <- 0.02
rw.sd_ivp <- 0.02

ivp_pos <- match("S_0",parameter_seir) - 1 
fix_col <- c(1:ivp_pos)[!parameter_seir[c(1:ivp_pos)] %in% c("alpha","gamma_B","gamma_A")]
rw_text <- paste(paste(parameter_seir[fix_col],"=",rw.sd_rp,sep = ""),collapse = ",")
rw_text_ivp <- paste(paste(parameter_seir[-(1:ivp_pos)],"=",rw.sd_rp,"",sep = ""),collapse = ",")
rw_text_fin <- paste("rw.sd(",rw_text,",",rw_text_ivp,")",sep="")
parameter.rw.sd <- eval(parse(text = rw_text_fin))

mif2(origin.model, params = guess[10,], Np = 1000,Nmif = 30,
     cooling.type = "geometric",cooling.fraction.50 = 0.5,
     rw.sd = parameter.rw.sd) %>%  mif2() -> mf
kingaa commented 3 years ago

The dmeasure snippet needs to fill lik with either the likelihood or the log likelihood, according to the needs of the calling algorithm (which in principle you cannot control). See the documentation. The code you give doesn't meet this requirement. Moreover, you are asking dpois to give you the log likelihood, then you are multiplying this by (1-phi_A). This is incoherent.

Compare the codes below.

dmeas <- Csnippet("
                  double lik_A, lik_B;
                  double mA = rho_A*K_A;

                  if (total_a < 0.5) {
                    lik_A = (1-phi_A)*dpois(0,mA,0) + phi_A;
                  } else {
                    lik_A = (1-phi_A)*dpois(total_a,mA,0);                    
                  }

                  double mB = rho_B*K_B;

                  if (total_b < 0.5) {
                    lik_B = (1-phi_B)*dpois(0,mB,0) + phi_B;
                  } else {
                    lik_B = (1-phi_B)*dpois(total_b,mB,0);
                  }
                  lik = (give_log) ? log(lik_A) + log(lik_B) : lik_A*lik_B;
                  ")
Spatial-R commented 3 years ago

Thanks! I can run the pomp model now. It seems that the result from the global searching is not well good. I will check the data and model structure again, eg, other distributions in the measure part.

kingaa commented 3 years ago

I will close this issue now. Feel free to reopen if more discussion is warranted.