Closed fomotis closed 3 years ago
@fomotis, let me see if I can help you. Thank you for providing so much detail. I tried to run the codes but got the following error:
object 'naija_covid_data2' not found
which is just because you did not provide the data. Could you please do so? If you don't want to share the actual data, perhaps you could provide files of the same shape and size and broad characteristics of the data, just so that I can run the code and see the errors. If the data are sensitive, you could also adulterate them, by, for example, adding random noise to them. This page describes how you can attach a file to the Issue.
I have attached the dataset below. I used the following codes to pre-process the dataset
nigeria_covid-19_subnational.zip
library(tidyverse)
#some important dates
lck_date <- "30-03-2020"
airport_closure <- "21-03-2020"
travel_ban <- "18-03-2020"
school_closure <- "19-03-2020"
##loading in the data set
naija_covid_data1 <- read.csv("Data/nigeria_covid-19_subnational.csv", sep =";")
naija_covid_data2 <- naija_covid_data1[, c(1, 5, 7:9)]
names(naija_covid_data2) <- c("Date", "State", "Cases", "Deaths", "Recovered")
#filter out data without NAs
Lagos_covid_data <- naija_covid_data2 %>%
dplyr::filter(State == "Lagos") %>%
dplyr::mutate(Cases = as.numeric(Cases),
Deaths = as.numeric(Deaths),
Recovered = as.numeric(Recovered),
Date = as.Date(Date, format = "%d/%m/%Y")) %>%
dplyr::filter(!is.na(Cases)) %>%
dplyr::mutate(dt = c(0, diff(Date)),
Day = cumsum(c(1, diff(Date)))
)
Thanks, @fomotis. I found three errors in your code. First, you had improper specification for the dmeasure components. The following are appropriate. If you do not understand why the codes you sent are fixed by these, let me know.
## density of the measurement process
dmeas <- Csnippet("
double f;
if (tetha > 0.0) {
f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, H, give_log);
} else {
f = dpois(nearbyint(Cases), H, give_log);
}
lik = (give_log) ? f : exp(f);
")
##for I
dmeas2 <- Csnippet("
double f;
if (tetha > 0.0) {
f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, I, give_log);
} else {
f = dpois(nearbyint(Cases), I, give_log);
}
lik = (give_log) ? f : exp(f);
")
Second, you were inserting untransformed values of the three parameters you want to estimate into subplex
. Since you have specified parameter transformations, the optimization will be performed on the transformed scale. In this case, each of the three parameters is log-transformed. So the appropriate call to subplex
would be
library(subplex)
fit <- subplex(log(c(2, 0.35, 1/14)), fn = lag_func)
lag_func(fit$par)
coef(lag_func)
Note that I apply the function one more time, at the parameters returned by subplex
, to ensure that these are stored within lag_func
.
Third, although you have 19 state variables (initialized in the rinit component), you have only 18 equations in the skeleton. Thus, one of the 19 derivatives takes whatever random value happens to be in that uninitialized memory location. This leads to failure of the integrator, which depends on the assumption that the vectorfield is smooth. In the following, I have set the 19th derivative to zero.
## Deterministic skeleton
skel <- Csnippet("
double lambda;
// transmission rate
double beta1 = R0 * g1; //transmission rate for asymptomatic
double beta2 = R0 * p2 * g2; //transmission rate for mild
// expected force of infection (iota is the imported infection)
//lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
lambda = ((beta1 * (Ia + Ip)) + (beta2 * Im))/ 20000000;
//suceptible class
DS = -lambda * S;
//exposed class
DE = (lambda * S) - (k * E);
//presymptomatic infectious class
DIp = (k * E) - (tau * Ip);
//asymptomatic infection class
DIa = (p1 * tau * Ip) - (g1 * Ia);
//mild infection class
DIm = ((1-p1) * tau * Ip) - ((p2*g2*Im) + (1 - p2)*delta*Im);
//isolation center
DIC = ((1 - p2)*delta*Im) - ((p3*g3*IC) + (1 - p3)*phi*IC);
//Recovery
DR = (g1*Ia) + (p2*g2*Im) + (p3*g3*IC);
//deaths
DD = (1 - p3)*phi*IC;
//true incidence
DH = g1*Ia + p2*g2*Im + p3*g3*IC + ((1 - p3)*phi*IC);
DI = (k * E) + (p1 * tau * Ip) + ((1-p1) * tau * Ip);
DN_SE = lambda * S;
DN_EIp = k * E;
DN_IpIa = p1 * tau * Ip;
DN_IpIm = (1-p1) * tau * Ip;
DN_IaR = g1 * Ia;
DN_ImIC = ((1 - p2)*delta*Im);
DN_ImR = p2*g2*Im;
DN_ICR = p3*g3*IC;
DN_ICD = 0;
")
Now the estimation succeeds:
> Lagos_covid_model %>%
+ traj_objfun(est = c("R0","tetha", "k"), #, "g1", "g2", "delta"
+ params = params,
+ ode_control = c(maxsteps = 1000)
+ ) -> lag_func
>
> library(subplex)
> fit <- subplex(log(c(2, 0.35, 1/14)), fn = lag_func)
> lag_func(fit$par);
[1] 374.6529
> coef(lag_func)
R0 p1 p2 p3 k
1.289795e+00 4.200000e-01 7.500000e-01 7.500000e-01 2.750269e-01
tau delta g1 g2 g3
3.333333e-01 5.000000e-01 2.160000e-01 1.080000e-01 6.942857e-02
phi tetha S_0 E_0 Ip_0
4.000000e-03 4.037025e-01 9.999994e-01 5.000000e-07 5.000000e-08
Ia_0 Im_0 IC_0 R_0 D_0
5.000000e-08 5.000000e-26 5.000000e-26 5.000000e-26 5.000000e-26
> logLik(lag_func)
[1] -374.6529
Thank you so much for your prompt help. I would never have seen these. I must have forgotten about DN_ICD and the lik part. I certainly don't want to set DN_ICD to 0 :). I have always thought the parameter transformations happen internally. I must have misunderstood the process then because I always supplied untransformed starting values.
Yes, they do happen internally, but the optimizer (e.g. subplex
) doesn't know anything about them.
Glad to have helped!
@fomotis, you actually did me a good favor with this report. I will think about whether flow
should be changed to set the vectorfield to zero prior to the user's C snippet being evaluated. On the other hand, while this would eliminate this error, it might have made it more difficult for you to find out that you had forgotten to give an equation for the 19th variable. Perhaps an entry in the FAQ regarding integrator errors...? Thoughts?
I think setting the vector field to zero apriori isn't a bad idea, but it could lead to more errors and questions. I think it is better to set up a variable that counts the state variables in both the deterministic skeleton and rmeasure. You could print a warning on compilation notifying the user of the imbalance in the number of state variables and that the unspecified ones are set to zero. I think this is a win-win situation for you and the user.
By the way, I think you should force people to always supply the deterministic skeleton except for when they wish to simulate from the model.
I think setting the vector field to zero apriori isn't a bad idea, but it could lead to more errors and questions. I think it is better to set up a variable that counts the state variables in both the deterministic skeleton and rmeasure. You could print a warning on compilation notifying the user of the imbalance in the number of state variables and that the unspecified ones are set to zero. I think this is a win-win situation for you and the user.
Interesting idea. I don't quite see how to implement it, though. The user's C snippet is monolithic, so nothing can be done inside that. I could set the variables. before executing the C snippet, to NA, for example, and then check how many are still NA afterward. This would incur a run-time cost and make the code more complex, so I think we should contemplate it a bit further before deciding it's a good idea. I'm inclined, instead, to write a new FAQ on the subject. That incurs no runtime error, but does incur a cost on some (most) users who will have to solve this problem once....
By the way, I think you should force people to always supply the deterministic skeleton except for when they wish to simulate from the model.
Not sure what you mean. As it stands, you're only forced to supply basic model components that are necessary for the operations you want to perform. The skeleton, in particular, is only needed for deterministic trajectories. A user performing a purely stochastic analysis would have no use for it, and it might even be difficult to work out a reasonable skeleton for an arbitrary stochastic process....
What we really need are tools that can analyze the C snippets. For example, if we could parse the C snippets, their structure could be validated prior to compilation, including checks like the one you propose. One could go farther, actually cross-checking rmeasure and dmeasure and rprocess and skeleton components, for example. It would also potentially eliminate the need for statenames
and paramnames
. I know next to nothing about parsing C code, though. One would think there was a portable code analyzer readily available, and perhaps there is. I have done some searching, and have come up empty handed; perhaps I'm searching with the wrong terms....
Ah! I see. An FAQ sounds appropriate then to avoid the additional computational cost. In terms of the deterministic skeleton, I am of the school of thought that even if interest lies in the stochastic version of a model, probably efforts should start from the deterministic version of that model. Possibly, an old school way of thinking. For example in my case, interest is actually in the stochastic version but I'd like to fit the deterministic model first to enable me to create a box for the global likelihood search.
A side question, would it be appropriate to do a local search on each parameter at the optimised parameter values from the deterministic skeleton?
I agree that there is a great deal of interest in the deterministic aspects of a model, even if it is the stochastic model that is advanced as an explanation.
I think it does make sense to fit a deterministic model as a way of narrowing down the space to be searched. It can be misleading, of course, but if it is misleading, it's likely to be so for interesting reasons!
Hello @kingaa, Thank you for your help on the deterministic model once again, but it appears I still need help concerning my model. I am trying to do particle filtering on the model below. I made some changes to include Euler multinomial distribution instead of the chained Binomial model after going through your work on Measles. The conditional likelihood goes to 0 very fast, and I am unable to get the standard error for the Monte-Carlo estimate for the loglikelihood. I get the value NA with a warning message "In max(x) : no non-missing arguments to max; returning -Inf". I initially thought it has to do with my rSim function, but I still arrive at the same problem after changing it to the new model below.
rSim <- Csnippet("
double lambda;
double rate[5], trans[5];
// transmission rate
double beta1 = R0 * g1; //transmission rate for asymptomatic
double beta2 = R0 * g2; //transmission rate for mild
// expected force of infection (iota is the imported infection)
//lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
lambda = ((beta1*(Ia + Ip)) + beta2*Im)/ N;
//S
double dN_SE = rbinom(S, 1 - exp(-lambda*dt)); //movement form S to E
//E
double dN_EIp = rbinom(E, 1 - exp(-k*dt)); //movement from E to Ip
//Ip
rate[0] = tau1;
rate[1] = tau2;
reulermultinom(2, Ip, &rate[0], dt, &trans[0]);
double dN_IpIm = trans[0]; //rate of developing mild infection
double dN_IpIa = trans[1]; //rate of aymptomatic infection
//Ia
double dN_IaR = rbinom(Ia, 1 - exp(-g1*dt)); //rate of recovery form aymptomatic infection
//Im
rate[2] = g2;
rate[3] = delta;
reulermultinom(2, Im, &rate[2], dt, &trans[2]);
double dN_ImR = trans[2]; //rate of recovery form mild infection
double dN_ImIC = trans[3]; //rate of developing severe symptoms
//IC
rate[4] = g3;
rate[5] = phi;
reulermultinom(2, IC, &rate[4], dt, &trans[4]);
double dN_ICR = trans[4]; //rate of recovery at isolation centers
double dN_ICD = trans[5]; //rate of deaths at isolation centers
// Balance the equations
S -= dN_SE;
E += dN_SE - dN_EIp;
Ip += dN_EIp - dN_IpIa - dN_IpIm;
Ia += dN_IpIa - dN_IaR;
Im += dN_IpIm - dN_ImIC - dN_ImR;
IC += dN_ImIC - dN_ICR - dN_ICD;
R += dN_IaR + dN_ImR + dN_ICR;
D += dN_ICD;
//Rprintf(\"%lg %lg %lg %lg %lg \\n\",lambda, dN_SE, S, E, Ip);
H += dN_IaR + dN_ImR + dN_ICR + dN_ICD; //true incidence
I += dN_EIp + dN_IpIa + dN_IpIm + dN_ImIC; //total infected
N_SE += dN_SE;
N_EIp += dN_EIp; //number transition from E to Ip
N_IpIa += dN_IpIa; //number transition from Ip to Ia
N_IpIm += dN_IpIm; //number transition from Ip to Im
N_IaR += dN_IaR; //number transition from Ia to R
N_ImIC += dN_ImIC; // number of transition from Im to IC
N_ImR += dN_ImR; //number of transitions from I to R
N_ICR += dN_ICR; // number of transitions from IC to R
N_ICD += dN_ICD; //number of transitions from IC to D
")
#initial values simulator
rInit <- Csnippet("
double m = N /(S_0 + E_0 + Ip_0 + Ia_0 + Im_0 + IC_0 + R_0 + D_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
Ip = nearbyint(m*Ip_0);
Ia = nearbyint(m*Ia_0);
Im = nearbyint(m*Im_0);
IC = nearbyint(m*IC_0);
R = nearbyint(m*R_0);
D = nearbyint(m*D_0);
H = 0;
I = 0;
N_SE = 0;
N_EIp = 0;
N_IpIa = 0;
N_IpIm = 0;
N_IaR = 0;
N_ImIC = 0; //italian citizen
N_ImR = 0;
N_ICR = 0;
N_ICD = 0;
")
#random generator of the measurement process
rmeas <- Csnippet("
if (tetha > 0.0) {
Cases = rnbinom_mu(1.0/tetha, H);
} else {
Cases = rpois(H);
}
"
)
rmeas2 <- Csnippet("
if (tetha > 0.0) {
Cases = rnbinom_mu(1.0/tetha, I);
} else {
Cases = rpois(I);
}
"
)
## density of the measurement process
dmeas <- Csnippet("
double f;
if(ISNA(Cases)) {
lik = (give_log) ? 0 : 1;
} else {
if (tetha > 0.0) {
f = dnbinom_mu(Cases, 1.0/tetha, H, give_log);
} else {
f = dpois(Cases, H, give_log);
}
}
lik = (give_log) ? f : exp(f);
Rprintf(\"%lg %lg %lg \\n\",f, H, Cases);
lik;
")
#for the deterministic skeleton
dmeas2 <- Csnippet("
double f;
if(ISNA(Cases)) {
lik = (give_log) ? 0 : 1;
} else {
if (tetha > 0.0) {
f = dnbinom_mu(nearbyint(Cases), 1.0/tetha, I, give_log);
} else {
f = dpois(nearbyint(Cases), I, give_log);
}
}
lik = (give_log) ? f : exp(f);
Rprintf(\"%lg %lg \\n\",f, I);
lik;
")
#Deterministic skeleton
skel <- Csnippet("
double lambda;
// transmission rate
double beta1 = R0 * g1; //transmission rate for asymptomatic
double beta2 = R0 * g2; //transmission rate for mild
// expected force of infection (iota is the imported infection)
//lambda = (beta1 * pow(Ia + Ip + iota, alpha) + beta2 * Im) / N;
lambda = ((beta1*(Ia + Ip)) + beta2*Im)/ 20000000;
//suceptible class
DS = -lambda * S;
//exposed class
DE = (lambda * S) - (k * E);
//presymptomatic infectious class
DIp = (k * E) - (tau1 * Ip + tau2 * Im);
//asymptomatic infection class
DIa = (tau1 * Ip) - (g1 * Ia);
//mild infection class
DIm = (tau2 * Im) - (g2*Im + delta*Im);
//isolation center
DIC = (delta*Im) - (g3*IC + phi*IC);
//Recovery
DR = (g1*Ia) + (g2*Im) + (g3*IC);
//deaths
DD = phi*IC;
//true incidence
DH = g1*Ia + g2*Im + g3*IC + (phi*IC);
DI = (k * E) + (tau1*Ip) + (tau2 * Ip);
DN_SE = lambda * S;
DN_EIp = k * E;
DN_IpIa = tau1 * Ip;
DN_IpIm = tau2 * Ip;
DN_IaR = g1 * Ia;
DN_ImIC = delta*Im;
DN_ImR = g2*Im;
DN_ICR = g3*IC;
DN_ICD = phi*IC;
")
### pomp formulation
# Population size
N <- 20000000
covid_model <- function(state = unique(naija_covid_data2$State)[1:37],
covid_data = naija_covid_data2,
timestep = 1/24, pop = N, type = c("I", "H")) {
state <- match.arg(state)
type = match.arg(type)
dt <- timestep
globs <- paste0("static int N = ", pop, ";")
state_covid_data <- covid_data %>%
filter(State == state) %>%
mutate(Cases = as.numeric(Cases),
Deaths = as.numeric(Deaths),
Recovered = as.numeric(Recovered),
Date = as.Date(Date, format = "%d/%m/%Y")) %>%
#filter(!is.na(Cases)) %>%
mutate(dt = c(0, diff(Date)),
Day = cumsum(c(1, diff(Date)))
)
state_covid_data %>%
dplyr::select(Day, Cases) %>%
pomp(
times = "Day",
t0 = 0,
obsnames = c("Cases"),
globals = globs,
rprocess = euler(rSim, delta.t = dt),
rinit = rInit,
rmeasure = if(type == "H") rmeas else rmeas2,
dmeasure = if(type == "H") dmeas else dmeas2,
skeleton = vectorfield(skel),
accumvars = c("H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm",
"N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
statenames = c("S", "E", "Ip", "Ia", "Im", "IC", "R", "D",
"H", "I","N_SE", "N_EIp", "N_IpIa", "N_IpIm",
"N_IaR", "N_ImIC", "N_ImR", "N_ICR", "N_ICD"),
paramnames = c("R0", "k", "tau1", "tau2", "delta",
"g1", "g2", "g3", "phi", "tetha", "S_0", "E_0",
"Ip_0", "Im_0", "Ia_0", "IC_0", "R_0", "D_0"
),
partrans = parameter_trans(
log = c("R0", "k", "g1", "g2", "g3", "tau1", "tau2", "delta", "phi", "tetha"),
baycentric = c("S_0", "E_0", "Ip_0", "Im_0",
"Ia_0", "IC_0", "R_0", "D_0")
)
) -> dis_model
}
# parameter estimation
#death rate function
drates <- function(X_death, Time_death, mu) {
(1 - mu*Time_death) * (X_death / Time_death)
}
# # recovery rate function
recrates <- function(X_death, Time_rec, mu) {
(1 - mu*Time_rec) * (1 - X_death) / Time_rec
}
cfr <- 0.028
T_death <- 7 #time to death in ic
T_ic_rec <- 14 #time to recover in ic
T_as_rec <- 3.5 #time to recover aymptotic
T_md_rec <- 3.5 #time to recover mild
T_mic <- 1.5 #time before develop developing severe infection
g1 <- recrates(cfr, T_as_rec, 0)
#g2 <- recrates(cfr, T_md_rec, 0)
g2 <- recrates(cfr, T_as_rec, 0)
g3 <- recrates(cfr, T_ic_rec, 0)
#sigma <- recrates(cfr, T_q, mu_Nig)
phi <- drates(cfr, T_death, 0)
params <- c(R0 = 1.90, #basic reproduction number (to estimate)
#p1 = 0.42, #proportion of aymptomatic infection (fix)
#p2 = 0.75, #proportion of mild that recovers (to estimate)
#p3 = 0.80, #proportion of recovery in IC (fix)
k = 1/12, #latency period (to estimate)
tau1 = 1/4.0, #rate of developing aymptomatic infection (to estimate)
tau2 = 1/5.0, #rate of developing aymptomatic infection (to estimate)
delta = 1/T_mic, #rate of developing severe infection (estimate)
g1 = g1, #rate of recovery for asymptomatic infections (4.5 days) (to estimate)
g2 = g2, #recovery rate in isolation centers (fix)
g3 = g3,
phi = phi, #death rate in isolation centers (fix)
tetha = 0.45, #overdispersion parameter (to estimate)
E_0 = 10,
Ip_0 = 0,
Ia_0 = 0,
Im_0 = 0,
IC_0 = 0,
R_0 = 0,
D_0 = 0,
S_0 = N - 10
)
Lagos_covid_model <- covid_model(state = "Lagos",
covid_data = naija_covid_data2,
timestep = 1/24,
type = "H"
)
Lagos_covid_model %>% simulate(nsim = 50, params = params,
format = 'data.frame',
include.data = T,
verbose = T
) %>%
mutate(is.data = ifelse(.id == "data", "yes", "no")) %>%
ggplot(aes(x = Day, y = Cases, group = .id, color = is.data, alpha = is.data)) +
geom_line() +
guides(color = F, alpha = F) +
scale_color_manual(values = c(no = gray(0.8), yes = "red3")) +
scale_alpha_manual(values = c(no = 0.5, yes = 1)) +
theme_bw()
#test particle filter
Lagos_covid_model %>% pfilter(Np = 20, params = params) -> pf_lagos
pf_lagos %>% logLik() %>% logmeanexp(se = TRUE)
plot(pf_lagos)
I will appreciate your insights into this.
The error
In max(x) : no non-missing arguments to max; returning -Inf
signifies that all particles have zero likelihood. This may be due to the improper specification of the dmeasure in both your snippets above. When give_log = 0
, the snippet must fill lik
' with the likelihood, and when give_log=1
, it must fill it with the log likelihood. You are passing give_log
to the low-level dnbinom_mu
function, and then taking the log again (or not). The second of these is redundant, and the log of a negative number is NaN.
Thank you for your earlier response. I changed my dmeasure to the following code and I still have the same error with the standard error. I noticed that the likelihood is close to 0 for the 100 particles I ran. However, the conditional likelihood plot isn't 0 as before. I believe I am still missing something here.
dmeas <- Csnippet("
if(ISNA(Cases)) {
lik = (give_log == 0) ? 0 : 1;
} else {
if (tetha > 0.0) {
lik = dnbinom_mu(Cases, 1.0/tetha, rho*H, give_log);
} else {
lik = dpois(Cases, rho*H, give_log);
}
}
Rprintf(\"%lg %lg %lg \\n\", lik, rho*H, Cases);
")
I don't see anything wrong with the code. You might try including the parameters as well in your Rprintf
statement, to verify that, e.g., neither tetha
nor rho
are taking invalid or nonsense values. I have glanced over the simulator code as well, and see nothing that occurs to me as an obvious mistake. You may need to dig deeper into the parameters and states that are actually being explored, for example, with more Rprintf
statements in the various snippets.
Hello @kingaa , the following code gave me the standard error. I realised that there must be replications after studying the logmeanexp source code.
replicate(n = 5, logLik(pfilter(Lagos_covid_model, Np = 2000, params = params))) %>% logmeanexp(se = TRUE)
Where is the error coming from? replicate
, logmeanexp
, logLik
, or pfilter
?
The error was coming from logmeanexp
. The line max(x)
is responsible for the warning (In max(x) : no non-missing arguments to max; returning -Inf) I was getting regarding the standard error so I figured x had to be a vector, hence the reason why the other code worked. This might be wrong anyway.
OK, that is good. Yes, evidently logmeanexp
is getting no input. What is the output of replicate
alone?
replicate(n = 5, logLik(pfilter(Lagos_covid_model, Np = 2000, params = params)))
gave me 5 estimates of the loglikelihood.
And all were finite? What happens when you run logmeanexp
on the output?
Yes, all were finite and I did get the estimate of the loglikelihood and standard error. However, I am currently running mif2() and some of the likelihood values printed to screen are infinite.
This is normal in the early stages of a search: some regions of parameter space may have likelihoods that are so small that they are effectively zero on a finite-precision machine. On the other hand, if this were happening for a model/data/parameter combination that I was thinking of publishing on, it would give me deep concern.
I am trying to fit a chanied binomial model for SEI(Ip, Ia, Im)ICR to COVID-19 data from Nigeria using the pomp approach. I divided the I compartments into 3 (Ip = presymptomatic, Im = mild infection and Ia = asymptomatic). I also included an IC compartment for Isolation centres. I successfully get simulations from this model PROBLEM whenever I try to estimate parameters using the subplex() function. I get the error in the title. From my understanding, the problem appears to be with the deterministic skeleton (skel Csnippet) but I have checked and rechecked this. I cannot see why the integrator runs into a problem at the initial values given. I will appreciate any help and enlightenment at all on this. The pomp codes are given below.
Thanks in advance CODES