kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
110 stars 26 forks source link

Error: in ‘subplex’: in ‘trajectory’: in ‘flow’: error in ODE integrator: illegal input detected before taking any integration steps #116

Closed fomotis closed 3 years ago

fomotis commented 3 years ago

I am trying to fit a chanied binomial model for SEI(Ip, Ia, Im)ICR to COVID-19 data from Nigeria using the pomp approach. I divided the I compartments into 3 (Ip = presymptomatic, Im = mild infection and Ia = asymptomatic). I also included an IC compartment for Isolation centres. I successfully get simulations from this model PROBLEM whenever I try to estimate parameters using the subplex() function. I get the error in the title. From my understanding, the problem appears to be with the deterministic skeleton (skel Csnippet) but I have checked and rechecked this. I cannot see why the integrator runs into a problem at the initial values given. I will appreciate any help and enlightenment at all on this. The pomp codes are given below.

Thanks in advance CODES

#process model simulator
rSim <- Csnippet("
  double lambda;

 // transmission rate
  double beta1 = R0 * g1; //transmission rate for asymptomatic
  double beta2 = R0 * p2 * g2; //transmission rate for mild

  // expected force of infection (iota is the imported infection)
  //lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
  lambda = ((beta1 * (Ia + Ip)) + (beta2 * Im))/ N;

  //S
  double dSE = rbinom(S, 1 - exp(-lambda*dt));       //movement form S to E

  //E
  double dEIp = rbinom(E, 1 - exp(-k*dt));          //movement from E to Ip

  //Ip
  double dIpIm = rbinom(Ip, 1 - exp(-(1-p1)*tau*dt));  //rate of developing mild infection 
  double dIpIa = rbinom(Ip, 1 - exp(-p1*tau*dt));  //rate of aymptomatic infection

  //Ia
  double dIaR = rbinom(Ia, 1 - exp(-g1*dt)); //rate of recovery form aymptomatic infection

  //Im
  double dImR = rbinom(Im, 1 - exp(-p2*g2*dt)); //rate of recovery form mild infection
  double dImIC = rbinom(Im, 1 - exp(-(1-p2)*delta*dt)); //rate of developing severe symptoms

  //IC
  double dICR = rbinom(IC, 1 - exp(-p3*g3*dt));      //rate of recovery at isolation centers
  double dICD = rbinom(IC, 1 - exp(-(1-p3)*phi*dt)); //rate of deaths at isolation centers

  // Balance the equations
  S +=  -dSE;
  E += dSE - dEIp;
  Ip += dEIp - dIpIa - dIpIm;
  Ia += dIpIa - dIaR;
  Im += dIpIm - dImIC - dImR;
  IC += dImIC - dICR - dICD;
  R += dIaR + dImR + dICR;
  D += dICD;

  H += dIaR + dImR + dICR + dICD; //true incidence
  I += dEIp + dIpIa + dIpIm;      //total infected

  N_SE += dSE;

  N_EIp += dEIp;     //number transition from E to Ip

  N_IpIa += dIpIa;   //number transition from Ip to Ia
  N_IpIm += dIpIm;  //number transition from Ip to Im

  N_IaR += dIaR;    //number transition from Ia to R

  N_ImIC += dImIC;  // number of transition from Im to IC
  N_ImR += dImR;   //number of transitions from I to R

  N_ICR += dICR;   // number of transitions from IC to R
  N_ICD += dICD; //number of transitions from IC to D

")

#initial values simulator
rInit <- Csnippet("
  double m = N /(S_0 + E_0 + Ip_0 + Ia_0 + Im_0 + IC_0 + R_0 + D_0);
  S = nearbyint(m*S_0);
  E = nearbyint(m*E_0);
  Ip = nearbyint(m*Ip_0);
  Ia = nearbyint(m*Ia_0);
  Im = nearbyint(m*Im_0);
  IC = nearbyint(m*IC_0);
  R = nearbyint(m*R_0);
  D = nearbyint(m*D_0);

  H = 0;
  I = 0;
  N_SE = 0;
  N_EIp = 0;
  N_IpIa = 0;
  N_IpIm = 0;
  N_IaR = 0;
  N_ImIC = 0;
  N_ImR = 0;
  N_ICR = 0;
  N_ICD = 0;

")

## density of the measurement process
dmeas <- Csnippet("

double f;
if (tetha > 0.0) {
  f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, H, give_log);
} else {
  f = dpois(nearbyint(Cases), H, give_log);
}

  lik = f;
  (give_log) ? lik : exp(lik);
")

#for I
dmeas2 <- Csnippet("

double f;
if (tetha > 0.0) {
  f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, I, give_log);
} else {
  f = dpois(nearbyint(Cases), I, give_log);
}

  lik = f;
  (give_log) ? lik : exp(lik);
")

#random generator of the measurement process
rmeas <- Csnippet("
if (tetha > 0.0) {
    Cases = rnbinom_mu(1.0/tetha, H);
} else {
    Cases = rpois(H);
}
"
)
# for I
rmeas2 <- Csnippet("
if (tetha > 0.0) {
    Cases = rnbinom_mu(1.0/tetha, I);
} else {
    Cases = rpois(I);
}
"
)

#Deterministic skeleton
skel <- Csnippet("

  double lambda;

  // transmission rate
  double beta1 = R0 * g1; //transmission rate for asymptomatic
  double beta2 = R0 * p2 * g2; //transmission rate for mild

  // expected force of infection (iota is the imported infection)
  //lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
  lambda = ((beta1 * (Ia + Ip)) + (beta2 * Im))/ 20000000;

     //suceptible class
    DS = -lambda * S;

    //exposed class
    DE = (lambda * S)  - (k  * E);

    //presymptomatic infectious class
    DIp = (k * E) - (tau * Ip);

    //asymptomatic infection class
    DIa = (p1 * tau * Ip) - (g1 * Ia);

    //mild infection class
    DIm = ((1-p1) * tau * Ip) - ((p2*g2*Im) + (1 - p2)*delta*Im);

    //isolation center
    DIC = ((1 - p2)*delta*Im) - ((p3*g3*IC) + (1 - p3)*phi*IC);

    //Recovery
    DR = (g1*Ia) + (p2*g2*Im) + (p3*g3*IC);

    //deaths
    DD = (1 - p3)*phi*IC;

    //true incidence
    DH = g1*Ia + p2*g2*Im + p3*g3*IC + ((1 - p3)*phi*IC);
    DI = (k * E) + (p1 * tau * Ip) + ((1-p1) * tau * Ip);
    DN_SE = lambda * S;
    DN_EIp = k * E;
    DN_IpIa = p1 * tau * Ip;
    DN_IpIm = (1-p1) * tau * Ip;
    DN_IaR = g1 * Ia;
    DN_ImIC = ((1 - p2)*delta*Im);
    DN_ImR = p2*g2*Im;
    DN_ICR = p3*g3*IC;

 ")

N <- 20000000
covid_model <- function(state = unique(naija_covid_data2$State)[1:37], 
                        covid_data = naija_covid_data2, 
                        timestep = 1/24, pop = N, type = c("I", "H")) {

  state <- match.arg(state)
  type = match.arg(type)
  dt <- timestep
  globs <- paste0("static int N = ", pop, ";")

  state_covid_data <- covid_data %>% 
    filter(State == state) %>%
    mutate(Cases = as.numeric(Cases), 
           Deaths = as.numeric(Deaths), 
           Recovered = as.numeric(Recovered), 
           Date = as.Date(Date, format = "%d/%m/%Y")) %>%
    filter(!is.na(Cases)) %>% 
    mutate(dt = c(0, diff(Date)), 
           Day = cumsum(c(1, diff(Date)))
    )

  state_covid_data %>% 
    dplyr::select(Day, Cases) %>%
    pomp(
      times = "Day",
      t0 = 0,
      obsnames = c("Cases"),
      globals = globs,
      rprocess = euler(rSim, delta.t = dt),
      rinit = rInit,
      rmeasure = if(type == "H") rmeas else rmeas2,
      dmeasure = if(type == "H") dmeas else dmeas2,
      skeleton = vectorfield(skel),
      accumvars = c("H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm", 
                    "N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
      statenames = c("S", "E", "Ip", "Ia", "Im", "IC", "R", "D", 
                     "H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm", 
                     "N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
      paramnames = c("R0", "p1", "p2", "p3", "k", "tau", "delta", 
                     "g1", "g2", "g3", "phi", "tetha",
                     "S_0", "E_0", "Ip_0", "Ia_0", "Im_0", "IC_0", "R_0", 
                     "D_0"),
      partrans = parameter_trans(
        log = c("R0", "k", "g1", "g2", "g3", "tau", "delta", "phi", "tetha"), 
        logit = c("p1", "p2", "p3"), 
        barycentric = c("S_0", "E_0", "Ip_0", "Ia_0", "Im_0", "IC_0", "R_0", 
                        "D_0")
      )
    ) -> dis_model

}

cfr <- 0.028
T_death <- 7 #time to death in ic
T_ic_rec <- 14 #time to recover in ic
T_as_rec <- 4.5 #time to recover aymptotic
T_md_rec <- 9 #time to recover mild
T_mic <- 2 #time before develop mild and asymptomatic

#death rate function
drates <- function(X_death, Time_death, mu) {

  (1 - mu*Time_death) * (X_death / Time_death)

}

# # recovery rate function
recrates <- function(X_death, Time_rec, mu) {

  (1 - mu*Time_rec) * (1 - X_death) / Time_rec

}
g1 <- recrates(cfr, T_as_rec, 0)
g2 <- recrates(cfr, T_md_rec, 0)
g3 <- recrates(cfr, T_ic_rec, 0)
phi <- drates(cfr, T_death, 0)

params <- c(R0 = 1.95,      #basic reproduction number (to estimate)
            p1 = 0.42,     #proportion of aymptomatic infection (fix)
            p2 = 0.75,      #proportion of mild that recovers (to estimate)
            p3 = 0.75,      #proportion of recovery in IC (fix)
            k = 1/12,       #latency period (to estimate)
            tau = 1/3,    #rate of developing aymptomatic/mild infection (to estimate)
            delta = 1/T_mic,     #rate of developing severe infection (estimate)
            g1 = g1,      #rate of recovery for asymptomatic infections (4.5 days) (to estimate)
            g2 = g2,      #recovery rate for mild nfected (to estimate)
            g3 = g3,      #recovery rate in isolation centers (fix)
            phi = phi,   #death rate in isolation centers (fix)
            tetha = 0.35,    #overdispersion parameter (to estimate)
            S_0 = N - 12 - (1.0e-18 * 4), #initial number of suceptibles (fix)
            E_0 = 10, #initial number of exposed persons (fix)
            Ip_0 = 1, #initial number of people in Quarantine (fix)
            Ia_0 = 1,  #intial infected persons (fix)
            Im_0 = 1.0e-18, #initial mild infected (fix)
            IC_0 = 1.0e-18, #initial number in infected persons (fix)
            R_0 = 1.0e-18, #initial number recovered (fix)
            D_0 = 1.0e-18 #initial number dead (fix)
)

Lagos_covid_model <- covid_model(state = "Lagos", 
                                 covid_data = naija_covid_data2, 
                                 timestep = 1/24,
                                 type = "H"
)

Lagos_covid_model %>% simulate(nsim = 50, params = params,
                               format = 'data.frame', 
                               include.data = T, 
                               verbose = T
) %>% 
  mutate(is.data = ifelse(.id == "data", "yes", "no")) %>%
  ggplot(aes(x = Day, y = Cases, group = .id, color = is.data, alpha = is.data)) + 
  geom_line() + 
  guides(color = F, alpha = F) +
  scale_color_manual(values = c(no = gray(0.8), yes = "red3")) + 
  scale_alpha_manual(values = c(no = 0.5, yes = 1)) + 
  theme_bw()

#parameter estimation
Lagos_covid_model %>% 
  traj_objfun(est = c("R0","tetha", "k"),   #, "g1", "g2", "delta"
              params = params,
              ode_control = c(maxsteps = 1000000)
  ) -> lag_func

fit <- subplex(c(2, 0.35, 1/14), fn = lag_func)
kingaa commented 3 years ago

@fomotis, let me see if I can help you. Thank you for providing so much detail. I tried to run the codes but got the following error:

object 'naija_covid_data2' not found

which is just because you did not provide the data. Could you please do so? If you don't want to share the actual data, perhaps you could provide files of the same shape and size and broad characteristics of the data, just so that I can run the code and see the errors. If the data are sensitive, you could also adulterate them, by, for example, adding random noise to them. This page describes how you can attach a file to the Issue.

fomotis commented 3 years ago

I have attached the dataset below. I used the following codes to pre-process the dataset

nigeria_covid-19_subnational.zip

library(tidyverse)

#some important dates

lck_date <- "30-03-2020"
airport_closure <- "21-03-2020"
travel_ban <- "18-03-2020"
school_closure <- "19-03-2020"

##loading in the data set

naija_covid_data1 <- read.csv("Data/nigeria_covid-19_subnational.csv", sep =";")
naija_covid_data2 <- naija_covid_data1[, c(1, 5, 7:9)]
names(naija_covid_data2) <- c("Date", "State", "Cases", "Deaths", "Recovered")

#filter out data without NAs
Lagos_covid_data <- naija_covid_data2 %>% 
  dplyr::filter(State == "Lagos") %>%
  dplyr::mutate(Cases = as.numeric(Cases), 
         Deaths = as.numeric(Deaths), 
         Recovered = as.numeric(Recovered), 
         Date = as.Date(Date, format = "%d/%m/%Y")) %>%
  dplyr::filter(!is.na(Cases)) %>% 
  dplyr::mutate(dt = c(0, diff(Date)), 
         Day = cumsum(c(1, diff(Date)))
        )
kingaa commented 3 years ago

Thanks, @fomotis. I found three errors in your code. First, you had improper specification for the dmeasure components. The following are appropriate. If you do not understand why the codes you sent are fixed by these, let me know.

## density of the measurement process
dmeas <- Csnippet("

double f;
if (tetha > 0.0) {
  f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, H, give_log);
} else {
  f = dpois(nearbyint(Cases), H, give_log);
}

  lik = (give_log) ? f : exp(f);
")

##for I
dmeas2 <- Csnippet("

double f;
if (tetha > 0.0) {
  f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, I, give_log);
} else {
  f = dpois(nearbyint(Cases), I, give_log);
}

  lik = (give_log) ? f : exp(f);
")

Second, you were inserting untransformed values of the three parameters you want to estimate into subplex. Since you have specified parameter transformations, the optimization will be performed on the transformed scale. In this case, each of the three parameters is log-transformed. So the appropriate call to subplex would be

library(subplex)
fit <- subplex(log(c(2, 0.35, 1/14)), fn = lag_func)
lag_func(fit$par)
coef(lag_func)

Note that I apply the function one more time, at the parameters returned by subplex, to ensure that these are stored within lag_func.

Third, although you have 19 state variables (initialized in the rinit component), you have only 18 equations in the skeleton. Thus, one of the 19 derivatives takes whatever random value happens to be in that uninitialized memory location. This leads to failure of the integrator, which depends on the assumption that the vectorfield is smooth. In the following, I have set the 19th derivative to zero.

## Deterministic skeleton
skel <- Csnippet("

  double lambda;

  // transmission rate
  double beta1 = R0 * g1; //transmission rate for asymptomatic
  double beta2 = R0 * p2 * g2; //transmission rate for mild

  // expected force of infection (iota is the imported infection)
  //lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
  lambda = ((beta1 * (Ia + Ip)) + (beta2 * Im))/ 20000000;

     //suceptible class
    DS = -lambda * S;

    //exposed class
    DE = (lambda * S)  - (k  * E);

    //presymptomatic infectious class
    DIp = (k * E) - (tau * Ip);

    //asymptomatic infection class
    DIa = (p1 * tau * Ip) - (g1 * Ia);

    //mild infection class
    DIm = ((1-p1) * tau * Ip) - ((p2*g2*Im) + (1 - p2)*delta*Im);

    //isolation center
    DIC = ((1 - p2)*delta*Im) - ((p3*g3*IC) + (1 - p3)*phi*IC);

    //Recovery
    DR = (g1*Ia) + (p2*g2*Im) + (p3*g3*IC);

    //deaths
    DD = (1 - p3)*phi*IC;

    //true incidence
    DH = g1*Ia + p2*g2*Im + p3*g3*IC + ((1 - p3)*phi*IC);
    DI = (k * E) + (p1 * tau * Ip) + ((1-p1) * tau * Ip);
    DN_SE = lambda * S;
    DN_EIp = k * E;
    DN_IpIa = p1 * tau * Ip;
    DN_IpIm = (1-p1) * tau * Ip;
    DN_IaR = g1 * Ia;
    DN_ImIC = ((1 - p2)*delta*Im);
    DN_ImR = p2*g2*Im;
    DN_ICR = p3*g3*IC;
    DN_ICD = 0;

 ")

Now the estimation succeeds:

> Lagos_covid_model %>%
+     traj_objfun(est = c("R0","tetha", "k"),   #, "g1", "g2", "delta"
+                 params = params,
+                 ode_control = c(maxsteps = 1000)
+     ) -> lag_func
> 
> library(subplex)
> fit <- subplex(log(c(2, 0.35, 1/14)), fn = lag_func)
> lag_func(fit$par);
[1] 374.6529
> coef(lag_func)
          R0           p1           p2           p3            k 
1.289795e+00 4.200000e-01 7.500000e-01 7.500000e-01 2.750269e-01 
         tau        delta           g1           g2           g3 
3.333333e-01 5.000000e-01 2.160000e-01 1.080000e-01 6.942857e-02 
         phi        tetha          S_0          E_0         Ip_0 
4.000000e-03 4.037025e-01 9.999994e-01 5.000000e-07 5.000000e-08 
        Ia_0         Im_0         IC_0          R_0          D_0 
5.000000e-08 5.000000e-26 5.000000e-26 5.000000e-26 5.000000e-26 
> logLik(lag_func)
[1] -374.6529
fomotis commented 3 years ago

Thank you so much for your prompt help. I would never have seen these. I must have forgotten about DN_ICD and the lik part. I certainly don't want to set DN_ICD to 0 :). I have always thought the parameter transformations happen internally. I must have misunderstood the process then because I always supplied untransformed starting values.

kingaa commented 3 years ago

Yes, they do happen internally, but the optimizer (e.g. subplex) doesn't know anything about them.

Glad to have helped!

kingaa commented 3 years ago

@fomotis, you actually did me a good favor with this report. I will think about whether flow should be changed to set the vectorfield to zero prior to the user's C snippet being evaluated. On the other hand, while this would eliminate this error, it might have made it more difficult for you to find out that you had forgotten to give an equation for the 19th variable. Perhaps an entry in the FAQ regarding integrator errors...? Thoughts?

fomotis commented 3 years ago

I think setting the vector field to zero apriori isn't a bad idea, but it could lead to more errors and questions. I think it is better to set up a variable that counts the state variables in both the deterministic skeleton and rmeasure. You could print a warning on compilation notifying the user of the imbalance in the number of state variables and that the unspecified ones are set to zero. I think this is a win-win situation for you and the user.

By the way, I think you should force people to always supply the deterministic skeleton except for when they wish to simulate from the model.

kingaa commented 3 years ago

I think setting the vector field to zero apriori isn't a bad idea, but it could lead to more errors and questions. I think it is better to set up a variable that counts the state variables in both the deterministic skeleton and rmeasure. You could print a warning on compilation notifying the user of the imbalance in the number of state variables and that the unspecified ones are set to zero. I think this is a win-win situation for you and the user.

Interesting idea. I don't quite see how to implement it, though. The user's C snippet is monolithic, so nothing can be done inside that. I could set the variables. before executing the C snippet, to NA, for example, and then check how many are still NA afterward. This would incur a run-time cost and make the code more complex, so I think we should contemplate it a bit further before deciding it's a good idea. I'm inclined, instead, to write a new FAQ on the subject. That incurs no runtime error, but does incur a cost on some (most) users who will have to solve this problem once....

By the way, I think you should force people to always supply the deterministic skeleton except for when they wish to simulate from the model.

Not sure what you mean. As it stands, you're only forced to supply basic model components that are necessary for the operations you want to perform. The skeleton, in particular, is only needed for deterministic trajectories. A user performing a purely stochastic analysis would have no use for it, and it might even be difficult to work out a reasonable skeleton for an arbitrary stochastic process....

kingaa commented 3 years ago

What we really need are tools that can analyze the C snippets. For example, if we could parse the C snippets, their structure could be validated prior to compilation, including checks like the one you propose. One could go farther, actually cross-checking rmeasure and dmeasure and rprocess and skeleton components, for example. It would also potentially eliminate the need for statenames and paramnames. I know next to nothing about parsing C code, though. One would think there was a portable code analyzer readily available, and perhaps there is. I have done some searching, and have come up empty handed; perhaps I'm searching with the wrong terms....

fomotis commented 3 years ago

Ah! I see. An FAQ sounds appropriate then to avoid the additional computational cost. In terms of the deterministic skeleton, I am of the school of thought that even if interest lies in the stochastic version of a model, probably efforts should start from the deterministic version of that model. Possibly, an old school way of thinking. For example in my case, interest is actually in the stochastic version but I'd like to fit the deterministic model first to enable me to create a box for the global likelihood search.

A side question, would it be appropriate to do a local search on each parameter at the optimised parameter values from the deterministic skeleton?

kingaa commented 3 years ago

I agree that there is a great deal of interest in the deterministic aspects of a model, even if it is the stochastic model that is advanced as an explanation.

I think it does make sense to fit a deterministic model as a way of narrowing down the space to be searched. It can be misleading, of course, but if it is misleading, it's likely to be so for interesting reasons!

fomotis commented 3 years ago

Hello @kingaa, Thank you for your help on the deterministic model once again, but it appears I still need help concerning my model. I am trying to do particle filtering on the model below. I made some changes to include Euler multinomial distribution instead of the chained Binomial model after going through your work on Measles. The conditional likelihood goes to 0 very fast, and I am unable to get the standard error for the Monte-Carlo estimate for the loglikelihood. I get the value NA with a warning message "In max(x) : no non-missing arguments to max; returning -Inf". I initially thought it has to do with my rSim function, but I still arrive at the same problem after changing it to the new model below.

rSim <- Csnippet("
  double lambda;
  double rate[5], trans[5];

 // transmission rate
  double beta1 = R0 * g1; //transmission rate for asymptomatic
  double beta2 = R0 * g2; //transmission rate for mild

  // expected force of infection (iota is the imported infection)
  //lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
  lambda = ((beta1*(Ia + Ip)) + beta2*Im)/ N;

  //S
  double dN_SE = rbinom(S, 1 - exp(-lambda*dt));         //movement form S to E

  //E
  double dN_EIp = rbinom(E, 1 - exp(-k*dt));          //movement from E to Ip

  //Ip
  rate[0] = tau1;
  rate[1] = tau2;
  reulermultinom(2, Ip, &rate[0], dt, &trans[0]);
  double dN_IpIm = trans[0];  //rate of developing mild infection 
  double dN_IpIa = trans[1];  //rate of aymptomatic infection

  //Ia
  double dN_IaR = rbinom(Ia, 1 - exp(-g1*dt)); //rate of recovery form aymptomatic infection

  //Im
  rate[2] = g2;
  rate[3] = delta;
  reulermultinom(2, Im, &rate[2], dt, &trans[2]);
  double dN_ImR = trans[2]; //rate of recovery form mild infection
  double dN_ImIC = trans[3]; //rate of developing severe symptoms

  //IC
  rate[4] = g3;
  rate[5] = phi;
  reulermultinom(2, IC, &rate[4], dt, &trans[4]);
  double dN_ICR = trans[4];  //rate of recovery at isolation centers
  double dN_ICD = trans[5]; //rate of deaths at isolation centers

  // Balance the equations
  S -=  dN_SE;
  E += dN_SE - dN_EIp;
  Ip += dN_EIp - dN_IpIa - dN_IpIm;
  Ia += dN_IpIa - dN_IaR;
  Im += dN_IpIm - dN_ImIC - dN_ImR;
  IC += dN_ImIC - dN_ICR - dN_ICD;
  R += dN_IaR + dN_ImR + dN_ICR;
  D += dN_ICD;

  //Rprintf(\"%lg %lg %lg %lg %lg \\n\",lambda, dN_SE, S, E, Ip);

  H += dN_IaR + dN_ImR + dN_ICR + dN_ICD; //true incidence
  I += dN_EIp + dN_IpIa + dN_IpIm + dN_ImIC;      //total infected

  N_SE += dN_SE;

  N_EIp += dN_EIp;     //number transition from E to Ip

  N_IpIa += dN_IpIa;   //number transition from Ip to Ia
  N_IpIm += dN_IpIm;  //number transition from Ip to Im

  N_IaR += dN_IaR;    //number transition from Ia to R

  N_ImIC += dN_ImIC;  // number of transition from Im to IC
  N_ImR += dN_ImR;   //number of transitions from I to R

  N_ICR += dN_ICR;   // number of transitions from IC to R
  N_ICD += dN_ICD; //number of transitions from IC to D

")

#initial values simulator
rInit <- Csnippet("

  double m = N /(S_0 + E_0 + Ip_0 + Ia_0 + Im_0 + IC_0 + R_0 + D_0);
  S = nearbyint(m*S_0);
  E = nearbyint(m*E_0);
  Ip = nearbyint(m*Ip_0);
  Ia = nearbyint(m*Ia_0);
  Im = nearbyint(m*Im_0);
  IC = nearbyint(m*IC_0);
  R = nearbyint(m*R_0);
  D = nearbyint(m*D_0);

  H = 0;
  I = 0;
  N_SE = 0;
  N_EIp = 0;
  N_IpIa = 0;
  N_IpIm = 0;
  N_IaR = 0;
  N_ImIC = 0; //italian citizen
  N_ImR = 0;
  N_ICR = 0;
  N_ICD = 0;

")

#random generator of the measurement process
rmeas <- Csnippet("
if (tetha > 0.0) {
    Cases = rnbinom_mu(1.0/tetha, H);
} else {
    Cases = rpois(H);
}
"
)

rmeas2 <- Csnippet("
if (tetha > 0.0) {
    Cases = rnbinom_mu(1.0/tetha, I);
} else {
    Cases = rpois(I);
}
"
)

## density of the measurement process
dmeas <- Csnippet("
double f;
if(ISNA(Cases)) {

    lik = (give_log) ? 0 : 1;

} else {

  if (tetha > 0.0) {
    f = dnbinom_mu(Cases, 1.0/tetha, H, give_log);
  } else {
    f = dpois(Cases, H, give_log);
  }
}

  lik = (give_log) ? f : exp(f);
  Rprintf(\"%lg %lg %lg \\n\",f, H, Cases);
  lik;
")

#for the deterministic skeleton
dmeas2 <- Csnippet("

double f;
if(ISNA(Cases)) {

    lik = (give_log) ? 0 : 1;

} else {

  if (tetha > 0.0) {
    f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, I, give_log);
  } else {
    f = dpois(nearbyint(Cases), I, give_log);
  }
}

  lik = (give_log) ? f : exp(f);
  Rprintf(\"%lg %lg \\n\",f, I);
  lik;
")

#Deterministic skeleton
skel <- Csnippet("

  double lambda;

  // transmission rate
  double beta1 = R0 * g1; //transmission rate for asymptomatic
  double beta2 = R0 * g2; //transmission rate for mild

  // expected force of infection (iota is the imported infection)
  //lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
  lambda = ((beta1*(Ia + Ip)) + beta2*Im)/ 20000000;

     //suceptible class
    DS = -lambda * S;

    //exposed class
    DE = (lambda * S)  - (k  * E);

    //presymptomatic infectious class
    DIp = (k * E) - (tau1 * Ip + tau2 * Im);

    //asymptomatic infection class
    DIa = (tau1 * Ip) - (g1 * Ia);

    //mild infection class
    DIm = (tau2 * Im) - (g2*Im + delta*Im);

    //isolation center
    DIC = (delta*Im) - (g3*IC + phi*IC);

    //Recovery
    DR = (g1*Ia) + (g2*Im) + (g3*IC);

    //deaths
    DD = phi*IC;

    //true incidence
    DH = g1*Ia + g2*Im + g3*IC + (phi*IC);
    DI = (k * E) + (tau1*Ip) + (tau2 * Ip);
    DN_SE = lambda * S;
    DN_EIp = k * E;
    DN_IpIa = tau1 * Ip;
    DN_IpIm = tau2 * Ip;
    DN_IaR = g1 * Ia;
    DN_ImIC = delta*Im;
    DN_ImR = g2*Im;
    DN_ICR = g3*IC;
    DN_ICD = phi*IC;

 ")

### pomp formulation
# Population size
N <- 20000000
covid_model <- function(state = unique(naija_covid_data2$State)[1:37], 
                        covid_data = naija_covid_data2, 
                        timestep = 1/24, pop = N, type = c("I", "H")) {

  state <- match.arg(state)
  type = match.arg(type)
  dt <- timestep
  globs <- paste0("static int N = ", pop, ";")

  state_covid_data <- covid_data %>% 
    filter(State == state) %>%
    mutate(Cases = as.numeric(Cases), 
           Deaths = as.numeric(Deaths), 
           Recovered = as.numeric(Recovered), 
           Date = as.Date(Date, format = "%d/%m/%Y")) %>%
    #filter(!is.na(Cases)) %>% 
    mutate(dt = c(0, diff(Date)), 
           Day = cumsum(c(1, diff(Date)))
    )

  state_covid_data %>% 
    dplyr::select(Day, Cases) %>%
    pomp(
      times = "Day",
      t0 = 0,
      obsnames = c("Cases"),
      globals = globs,
      rprocess = euler(rSim, delta.t = dt),
      rinit = rInit,
      rmeasure = if(type == "H") rmeas else rmeas2,
      dmeasure = if(type == "H") dmeas else dmeas2,
      skeleton = vectorfield(skel),
      accumvars = c("H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm", 
                    "N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
      statenames = c("S", "E", "Ip", "Ia", "Im", "IC", "R", "D", 
                     "H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm", 
                     "N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
      paramnames = c("R0", "k", "tau1", "tau2", "delta", 
                     "g1", "g2", "g3", "phi", "tetha", "S_0", "E_0",
                     "Ip_0", "Im_0", "Ia_0", "IC_0", "R_0", "D_0"
                     ),
      partrans = parameter_trans(
        log = c("R0", "k", "g1", "g2", "g3", "tau1", "tau2", "delta", "phi", "tetha"), 
        baycentric = c("S_0", "E_0", "Ip_0", "Im_0", 
                       "Ia_0", "IC_0", "R_0", "D_0")
      )
    ) -> dis_model

}

# parameter estimation
#death rate function
drates <- function(X_death, Time_death, mu) {

  (1 - mu*Time_death) * (X_death / Time_death)

}

# # recovery rate function
recrates <- function(X_death, Time_rec, mu) {

  (1 - mu*Time_rec) * (1 - X_death) / Time_rec

}

cfr <- 0.028
T_death <- 7 #time to death in ic
T_ic_rec <- 14 #time to recover in ic
T_as_rec <- 3.5 #time to recover aymptotic
T_md_rec <- 3.5 #time to recover mild
T_mic <- 1.5 #time before develop developing severe infection

g1 <- recrates(cfr, T_as_rec, 0)
#g2 <- recrates(cfr, T_md_rec, 0)
g2 <- recrates(cfr, T_as_rec, 0)
g3 <- recrates(cfr, T_ic_rec, 0)
#sigma <- recrates(cfr, T_q, mu_Nig)
phi <- drates(cfr, T_death, 0)

params <- c(R0 = 1.90,     #basic reproduction number (to estimate)
            #p1 = 0.42,     #proportion of aymptomatic infection (fix)
            #p2 = 0.75,      #proportion of mild that recovers (to estimate)
            #p3 = 0.80,      #proportion of recovery in IC (fix)
            k = 1/12,       #latency period (to estimate)
            tau1 = 1/4.0,   #rate of developing aymptomatic infection (to estimate)
            tau2 = 1/5.0,      #rate of developing aymptomatic infection (to estimate)
            delta = 1/T_mic,     #rate of developing severe infection (estimate)
            g1 = g1,      #rate of recovery for asymptomatic infections (4.5 days) (to estimate)
            g2 = g2,      #recovery rate in isolation centers (fix)
            g3 = g3,
            phi = phi,   #death rate in isolation centers (fix)
            tetha = 0.45,    #overdispersion parameter (to estimate)
            E_0 = 10,
            Ip_0 = 0,
            Ia_0 = 0,
            Im_0 = 0,
            IC_0 = 0,
            R_0 = 0,
            D_0 = 0,
            S_0 = N - 10

)

Lagos_covid_model <- covid_model(state = "Lagos", 
                                 covid_data = naija_covid_data2, 
                                 timestep = 1/24,
                                 type = "H"
)

Lagos_covid_model %>% simulate(nsim = 50, params = params,
                               format = 'data.frame', 
                               include.data = T, 
                               verbose = T
) %>% 
  mutate(is.data = ifelse(.id == "data", "yes", "no")) %>%
  ggplot(aes(x = Day, y = Cases, group = .id, color = is.data, alpha = is.data)) + 
  geom_line() + 
  guides(color = F, alpha = F) +
  scale_color_manual(values = c(no = gray(0.8), yes = "red3")) + 
  scale_alpha_manual(values = c(no = 0.5, yes = 1)) + 
  theme_bw()

#test particle filter
Lagos_covid_model %>% pfilter(Np = 20, params = params) -> pf_lagos
pf_lagos %>% logLik() %>% logmeanexp(se = TRUE)
plot(pf_lagos)

I will appreciate your insights into this.

kingaa commented 3 years ago

The error

In max(x) : no non-missing arguments to max; returning -Inf

signifies that all particles have zero likelihood. This may be due to the improper specification of the dmeasure in both your snippets above. When give_log = 0, the snippet must fill lik' with the likelihood, and when give_log=1, it must fill it with the log likelihood. You are passing give_log to the low-level dnbinom_mu function, and then taking the log again (or not). The second of these is redundant, and the log of a negative number is NaN.

fomotis commented 3 years ago

Thank you for your earlier response. I changed my dmeasure to the following code and I still have the same error with the standard error. I noticed that the likelihood is close to 0 for the 100 particles I ran. However, the conditional likelihood plot isn't 0 as before. I believe I am still missing something here.

dmeas <- Csnippet("

if(ISNA(Cases)) {

    lik = (give_log == 0) ? 0 : 1;

} else {

  if (tetha > 0.0) {
    lik = dnbinom_mu(Cases, 1.0/tetha, rho*H, give_log);
  } else {
    lik = dpois(Cases, rho*H, give_log);
  }
}

  Rprintf(\"%lg %lg %lg \\n\", lik, rho*H, Cases);
")
kingaa commented 3 years ago

I don't see anything wrong with the code. You might try including the parameters as well in your Rprintf statement, to verify that, e.g., neither tetha nor rho are taking invalid or nonsense values. I have glanced over the simulator code as well, and see nothing that occurs to me as an obvious mistake. You may need to dig deeper into the parameters and states that are actually being explored, for example, with more Rprintf statements in the various snippets.

fomotis commented 3 years ago

Hello @kingaa , the following code gave me the standard error. I realised that there must be replications after studying the logmeanexp source code.

replicate(n = 5, logLik(pfilter(Lagos_covid_model, Np = 2000, params = params))) %>% logmeanexp(se = TRUE)

kingaa commented 3 years ago

Where is the error coming from? replicate, logmeanexp, logLik, or pfilter?

fomotis commented 3 years ago

The error was coming from logmeanexp. The line max(x) is responsible for the warning (In max(x) : no non-missing arguments to max; returning -Inf) I was getting regarding the standard error so I figured x had to be a vector, hence the reason why the other code worked. This might be wrong anyway.

kingaa commented 3 years ago

OK, that is good. Yes, evidently logmeanexp is getting no input. What is the output of replicate alone?

fomotis commented 3 years ago

replicate(n = 5, logLik(pfilter(Lagos_covid_model, Np = 2000, params = params))) gave me 5 estimates of the loglikelihood.

kingaa commented 3 years ago

And all were finite? What happens when you run logmeanexp on the output?

fomotis commented 3 years ago

Yes, all were finite and I did get the estimate of the loglikelihood and standard error. However, I am currently running mif2() and some of the likelihood values printed to screen are infinite.

kingaa commented 3 years ago

This is normal in the early stages of a search: some regions of parameter space may have likelihoods that are so small that they are effectively zero on a finite-precision machine. On the other hand, if this were happening for a model/data/parameter combination that I was thinking of publishing on, it would give me deep concern.