Closed mayagroner closed 3 years ago
It sounds like it should be possible to do this. If I understand you correctly, you have samples of fish and your presence/absence assay is a direct estimator of prevalence of infection. For example, suppose you have a sample of size n and you observe k infected fish in the sample. Assuming the test is perfect, the sample size n is much smaller than the full population size, and that infection doesn't alter the probability of being sampled, then k ~ Binomial(n,p), where p is the prevalence of infection. Perhaps this is a place to begin the conversation.
You mention that you have multiple samples. How are these related to one another?
Yes. You are correct. We have estimates of infection prevalence and the binomial sampling design makes sense. The sample size is ~5-6 orders of magnitude lower than the population size (depending on the year, these populations are highly variable across years, something that we think may be contributing to the infection dynamics). Typically we sample the population three times at three locations each spring using a cast net or purse seine. I consider this a nested sampling design where is a random effect of sampling event (e.g., fish caught in the same net may not be independent). Thank you again for your help!
How do you think about the random effect of sampling event? How are the fish caught in one net more or less similar to fish caught in different nets, or not at all?
Also, you mentioned that you have data on the population size itself. What are the errors associated with those data?
You can see that the questions I'm asking are designed to help you focus on modeling the measurement error. You may be able to come up with a mechanistic model of the measurement error. While such a model would be interesting, there is of course no guarantee that it would explain the data better than another model. For this reason, I'd recommend comparing it with a more flexible model.
We see some variation in age among the fish caught in each sampling event and we know there is variation in prevalence across age groups. (Eventually, I think an age-structured disease model may be necessary, but I think a basic SI model is a good place to start). There may be other sources of variation that we have not assessed (e.g., condition of fish- perhaps healthier fish aggregate towards the front of the schools and less healthy fish towards the back). I don't think we understand the schooling behavior well enough to make a mechanistic model of the measurement error. We don't think that there is any bias in terms of catch rates. The prevalence within a sampled net is representative of the prevalence in the local area.
Our population estimates come from a stock assessment model that is informed by multiple data sources. We have estimates and 95% CIs associated with each age class for each study year.
What would you think, then, about starting with a Binomial model for the prevalence -> sample positive connection, and a normal or lognormal model for the population estimate error? I don't know much about stock assessments....
However, perhaps we should be worried if we use anything other than a Poisson model for fish. ;-)
That sounds like a good start! Thank you for the help and the joke.
Would this be the right way to code that? For the data, popdata
is the population size data, sigmadata
is standard deviation of the population size data, positive
are the disease positive samples, and negative
are the disease negative samples.
I have several lik terms in si_dmeas. Should these be named lika, likb, likc, likd? I'm not sure how to tell the model to maximize the likelihood against all of these sources of data.
Thank you again for your help.
si_dmeas <- Csnippet("
lik = dbinom(positive,H,rho,give_log);
lik = dbinom(negative,pop-H,rho,give_log);
lik = dlnorm(popdata, exp(pop), sigmadata, give_log);
lik = dlnorm(sigmadata, exp(sigma), sd, give_log);
")
si_rmeas <- Csnippet("
popdata = rlnorm(exp(pop), sigma);
sigmadata = rlnorm(exp(sigma), sd);
positive = rbinom(H,rho);
negative = rbinom(pop-H, rho)
")
Have a look at the following.
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive,positive+negative,I/pop,give_log);
lik2 = dlnorm(popdata, log(pop), sigmadata, give_log);
if (give_log) lik = lik1+lik2; else lik = lik1*lik2;
")
A few things to note:
lik1
and lik2
to temporarily hold two components of the likelihood. These get filled with likelihood or log-likelihood as requested (according to the give_log
flag). Accordingly, the lik
is set equal to the product or the sum, respectively.pop
. [I think maybe in your model, pop
is the log of the population size, but I don't know. You didn't mention what the state variables signify.]sigmadata
is somehow available (as a state variable, a parameter, a covariate, or a datum). Since it's not being treated as informative per se, it logically makes the most sense to think of it as a parameter (if its a constant) or a covariate (if it varies with time).log(popdata)
is normally distributed with mean log(pop)
and s.d. sigmadata
.The above assumes that positive
and negative
are both data variables. When we come to try and write the rmeasure component, we will see that this is not very convenient. Since positive+negative
is equal to the sample size, it makes sense to treat the latter as a covariate, assuming it is not itself informative. If we do this, letting samplesize = positive + negative
, and assuming it's a covariate, we have the following.
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive,samplesize,I/pop,give_log);
lik2 = dlnorm(popdata, log(pop), sigmadata, give_log);
if (give_log) lik = lik1+lik2; else lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/pop);
popdata = rlnorm(popdata, log(pop), sigmadata, give_log);
")
If, on the other hand, the sample size is itself informative (perhaps about the population size), then one should model that too.
Closing this now, but feel free to re-open if more discussion is warranted.
Thank you again for your help. I have a simple model coded using the dmeas and rmeas code you suggested, but I am getting some errors when I try to construct the pomp object. I've gone over it several times and I am not sure where the error is, but I have posted the code and error message below. The data that go with the code are positive (# positives), sample size (# number tested for disease), maturepopmean and maturepopsd (the estimated population size and standard deviation), and recruitment and recruitment sd (the number and sd of individuals recruiting into the mature population). I will directly email you the datasets associated with these data. Thank you again. The learning curve for this package is steep and the help is very appreciated!
The code:
library(doParallel)
library(doRNG)
library(readr)
registerDoParallel()
registerDoRNG(2488820)
library(tidyverse)
read_csv(paste0("C:/Users/mayag/Dropbox/Ichthyophonus/Ich mort model/pomp model/PWS disease survey.csv")) -> meas
meas %>% as.data.frame() %>% head()
#plot sample prevalence
meas %>%
ggplot(aes(x=year,y=positive/samplesize))+
geom_line()+
geom_point()
#plot mature population
meas %>%
ggplot(aes(x=year,y=maturepopmean))+
geom_line()+
geom_point()
read_csv(paste0("C:/Users/mayag/Dropbox/Ichthyophonus/Ich mort model/pomp model/PWS recruitment.csv")) -> covartable
covartable<-covartable[covartable$year>2006,]
covartable<-covartable[covartable$year<2019,]
#plot recruitment
covartable %>%
ggplot(aes(x=year,y=recruitment))+
geom_line()+
geom_point()
library(pomp)
rprocSI <- Csnippet("
double beta, rec, foi, mu, mud, pos, pop, q;
double rate[3], trans[3];
foi = beta*S*I/(pop^q); //q determines the degree to which frequency or density dependence is occurring (q = 0 for dd, q = 1 for fd)
rate[0] = foi; //force of infection
rate[1] = mu; // natural death
rate[2] = mud; // disease death
// Recruitment is randomly picked from the distribution provided in the data
rec = rnorm(recruitment*dt, recruitmentsd*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
S += rec*(1-pos) - trans[0] - trans[1]; //recruits can be disease positive or negative
I += rec*(pos)+trans[0] - trans[1] - trans[2];
I = pop - S; // Is this ok to have I included twice? I need to include pop in here somewhere
H += trans[0]; // true incidence, this may not be necessary to record
")
si_rinit <- Csnippet("
S = nearbyint(S_0);
I = nearbyint(pop-S_0);
H = 0;
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive,samplesize,I/pop, give_log);
lik2 = dlnorm(maturepopmean, log(pop), maturepopsd, give_log);
if (give_log) lik = lik1+lik2; else lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/pop);
maturepopmean = rlnorm(maturepopmean, log(pop), maturepopsd, give_log);
")
#---------------------------------------------
pt <- parameter_trans(
log=c("pop", "foi", "mu", "mud", "beta", "rec"),
logit=c("pos", "q")
)
meas %>%
pomp(time="year", t0=0, rprocess=euler(rprocSI,delta.t=1/12),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
accumvars=c("H"),
covar=covariate_table(covartable,times="year"),
obsnames = c("maturepopmean", "maturepopsd", "positive", "samplesize"),
statenames=c("S", "I", "H"),
paramnames=c("beta", "pop", "rec", "foi", "pos","mu", "mud", "q", "S_0"),
cdir=".", cfile="measSI"
) -> measSI
Associated error message:
Error: error in building shared-object library from C snippets: in ‘Cbuilder’: compilation error: cannot compile shared-object library ‘./measSI.dll’: status = 1
compiler messages:
"C:/rtools40/mingw64/bin/"gcc -I"C:/PROGRA~1/R/R-40~1.3/include" -DNDEBUG -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c measSI.c -o measSI.o
measSI.c:12: warning: "beta" redefined
#define beta (__p[__parindex[0]])
In file included from C:/Users/mayag/Documents/R/win-library/4.0/pomp/include/pomp.h:9,
from measSI.c:5:
C:/PROGRA~1/R/R-40~1.3/include/Rmath.h:212: note: this is the location of the previous definition
#define beta Rf_beta
measSI.c: In function '__pomp_stepfn':
measSI.c:52:16: error: '__p' redeclared as different kind of symbol
#define beta (__p[__parindex[0]])
^~~
measSI.c:70:10: note: in expansion of macro 'beta'
double beta, rec, foi, mu, mud, pos, pop, q;
^~~~
measSI.c:67:48: note: previous definition of '__p' was
In addition: Warning message:
In system2(command = R.home("bin/R"), args = c("CMD", "SHLIB", "-c", :
running command '"C:/PROGRA~1/R/R-40~1.3/bin/R" PKG_CPPFLAGS="-IC:/Users/mayag/Documents/R/win-library/4.0/pomp/include -IC:/Users/mayag/Documents/pomp SI" CMD SHLIB -c -o ./measSI.dll ./measSI.c' had status 1
The error you got is just due to a name conflict between your parameter beta
and the beta function, which is defined in Rmath.h
and documented in the R extensions manual. I'd suggest renaming your parameter. I usually use Beta
, for example. This one should go into the FAQ.
Debugging is an iterative process: usually code execution stops at the first problem. Fixing it uncovers the second problem, and so on. So I copied your code (and your data, which you sent to me privately) to a new directory and began editing your file in the debugging process. This risks depriving you of an important learning experience, so I hope you go through the code below line by line to see the changes I made. All of these are for a reason: feel free to ask about any of them, though you will probably figure most of them out yourself. For example, you can replace any of my edits by the original line in your code and see what happens.
A couple of remarks about pomp and C are in order.
t
, and sometimes a variable lik
and a flag give_log
are pre-defined. Read about C snippets in the package manual, where I attempt to make all this clear.partrans
is specified.rlnorm
and dlnorm
. Look up the call sequence for these functions in the R extensions manual.library(tidyverse)
read_csv("PWS_disease_survey.csv") -> meas
##plot sample prevalence
meas %>%
ggplot(aes(x=year,y=positive/samplesize))+
geom_line()+
geom_point()
##plot mature population
meas %>%
ggplot(aes(x=year,y=maturepopmean))+
geom_line()+
geom_point()
read_csv("PWS_recruitment.csv") %>%
filter(year>2006,year<2019) -> covartable
##plot recruitment
covartable %>%
ggplot(aes(x=year,y=recruitment))+
geom_line()+
geom_point()
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(S_0);
I = nearbyint(pop-S_0);
H = 0;
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive,samplesize,I/pop, give_log);
lik2 = dlnorm(maturepopmean, log(pop), maturepopsd, give_log);
if (give_log) lik = lik1+lik2; else lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/pop);
maturepopmean = rlnorm(log(pop), maturepopsd);
")
rprocSI <- Csnippet("
double rate[3], trans[3];
double foi = Beta*S*I/(pow(pop,q)); //q determines the degree to which frequency or density dependence is occurring (q = 0 for dd, q = 1 for fd)
rate[0] = foi; //force of infection
rate[1] = mu; // natural death
rate[2] = mud; // disease death
// Recruitment is randomly picked from the distribution provided in the data
double rec = rnorm(recruitment*dt, recruitmentsd*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
S += rec*(1-pos) - trans[0] - trans[1]; //recruits can be disease positive or negative
I += rec*pos + trans[0] - trans[1] - trans[2];
I = pop - S; // NOTE THAT THIS LINE UNDOES THE EFFECT OF THE PREVIOUS LINE.
H += trans[0]; // true incidence, this may not be necessary to record
")
meas %>%
pomp(time="year", t0=0,
rprocess=euler(rprocSI,delta.t=1/12),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
accumvars=c("H"),
partrans=parameter_trans(
log=c("pop", "mu", "mud", "Beta"),
logit=c("pos", "q")
),
covar=covariate_table(covartable,times="year"),
statenames=c("S", "I", "H"),
paramnames=c("Beta", "pop", "pos","mu", "mud", "q", "S_0"),
cdir=".", cfile="measSI"
) -> measSI
Thank you very much! I think much of my confusion was associated with what can and cannot be declared within a Csnippet. I am going through the code now to understand how my errors are related to specific error messages.
One follow-up question is regarding the simulate function. I read through and addressed the suggestions in faq 4.5 (e.g., my covarariates tables extends from t(0) to several timepoint past t(N)), but I am still receiving error messages relating to how the times in the covariate table are read. I receive the following error code when I construct the pomp object.
Warning message:
in ‘pomp’: the supplied covariate times do not embrace the data times: covariates may be extrapolated.
I have set up the recruitment (my covariate) so that it is 0 for all but one of my timepoints within a year to reflect the pulsed nature of fish recruitment. I don't think this is the cause of the problem, because I still receive the error message when I don't do this.
This is the code I have been using (still with the data I sent you privately).
library(tidyverse)
require(readr)
read_csv("PWS_disease_survey.csv") -> meas
##plot sample prevalence
meas %>%
ggplot(aes(x=year,y=positive/samplesize))+
geom_line()+
geom_point()
##plot mature population
meas %>%
ggplot(aes(x=year,y=maturepopmean))+
geom_line()+
geom_point()
read_csv("PWS_recruitment.csv") %>%
filter(year>2005)-> covartable
covartable2<-as.data.frame(seq(min(covartable$year), max(covartable$year), by = 1/12))
names(covartable2)[1] <- "year"
covartable2$recruitment<-0
covartable2$recruitmentsd<-0
covartable2$year<-ifelse(covartable2$year-floor(covartable2)==0,NA, covartable2$year)
covartable2<-covartable2[complete.cases(covartable2), ]
covartable<-rbind(covartable, covartable2)
covartable <- covartable[order(covartable$year), ]
##plot recruitment, it should be a single pulse each year
covartable %>%
ggplot(aes(x=year,y=recruitment))+
geom_line()+
geom_point()
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(S_0);
I = nearbyint(pop-S_0);
H = 0;
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive,samplesize,I/pop, give_log);
lik2 = dlnorm(maturepopmean, log(pop), maturepopsd, give_log);
if (give_log) lik = lik1+lik2; else lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/pop);
maturepopmean = rlnorm(log(pop), maturepopsd);
")
rprocSI <- Csnippet("
double rate[3], trans[3], rec;
double foi = Beta*S*I/(pow(pop,q)); //q determines the degree to which frequency or density dependence is occurring (q = 0 for dd, q = 1 for fd)
rate[0] = foi; //force of infection
rate[1] = mu; // natural death
rate[2] = mud; // disease death
// Recruitment is randomly picked from the distribution provided in the data
rec = rnorm(recruitment, recruitmentsd);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
S += rec*(1-pos) - trans[0] - trans[1]; //recruits can be disease positive or negative
I += rec*pos + trans[0] - trans[1] - trans[2];
H += trans[0]; // true incidence, this may not be necessary to record
")
meas %>%
pomp(time="year", t0=0,
rprocess=euler(rprocSI,delta.t=1/12),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
accumvars=c("H"),
partrans=parameter_trans(
log=c("pop", "mu", "mud", "Beta"),
logit=c("pos", "q")
),
covar=covariate_table(covartable,times="year"),
statenames=c("S", "I", "H"),
paramnames=c("Beta", "pop", "pos","mu", "mud", "q", "S_0"),
cdir=".", cfile="measSI"
) -> measSI
measSI %>%
simulate(
params=c(Beta=1.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, q=0.8, S_0=103871215),
nsim=20,format="data.frame",include.data=TRUE
) -> sims
There are quite a number of interesting issues that come up in this context. These range from simple matters to do with the code to more delicate issues to do with the way you think about the data.
Have a look at the following codes, which we can discuss.
library(tidyverse)
require(readr)
read_csv("PWS_disease_survey.csv") %>%
mutate(
msd = log(1+maturepopsd/maturepopmean)
) %>%
select(-maturepopsd,-negative) -> meas
I'm assuming a log-normal model for the errors in population-size estimates and recruitment.
##plot sample prevalence
meas %>%
ggplot(aes(
x=year,y=positive/samplesize,
ymin=qbeta(p=0.25,shape1=positive,shape2=negative+1),
ymax=qbeta(p=0.75,shape1=positive+1,shape2=negative)))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs(y="prevalence")
##plot mature population
meas %>%
ggplot(aes(
x=year,y=maturepopmean,
ymin=qlnorm(p=0.25,meanlog=log(maturepopmean),sdlog=msd),
ymax=qlnorm(p=0.75,meanlog=log(maturepopmean),sdlog=msd)
))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs("mature population size")
In your model, recruitment and its variability are really covariates, not data. So is the sample-size and the s.d. of the population size estimator. Accordingly, I put these into the covariate table, and remove them from the data.
read_csv("PWS_recruitment.csv") %>%
filter(year>2005) %>%
mutate(
rsd=log(1+recruitmentsd/recruitment)
) %>%
select(-recruitmentsd) %>%
left_join(
meas %>%
select(year,samplesize,msd),
by="year"
) -> covartable
meas %>%
select(-samplesize,-msd) -> meas
covartable %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qlnorm(p=0.25,meanlog=log(recruitment),sdlog=rsd),
ymax=qlnorm(p=0.75,meanlog=log(recruitment),sdlog=rsd)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()
Now, you want your recruitment to happen in a pulse, once per year. In the following, this pulse arrives during the month of January.
covartable %>%
select(year) %>%
expand_grid(month=1:12) %>%
mutate(
time=year+(month-1)/12
) %>%
select(-month,-year) %>%
left_join(covartable,by=c("time"="year")) %>%
rename(year=time) %>%
mutate(
recruitment=coalesce(recruitment,0),
rsd=coalesce(rsd,0),
samplesize=coalesce(samplesize,0),
msd=coalesce(msd,0)
) %>%
arrange(year) -> covartable
##plot recruitment, it should be a single pulse each year
covartable %>%
ggplot(aes(x=year,y=recruitment))+
geom_step()+
geom_point()
In debugging your code, I reparameterized the initial condition.
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(pop*sfrac);
I = nearbyint(pop*(1-sfrac));
")
The measurement model below conforms (I think) more closely to the model you have in mind.
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive, samplesize, I/(S+I), give_log);
lik2 = dlnorm(maturepopmean, log(S+I), msd, give_log);
if (give_log)
lik = lik1+lik2;
else
lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/(S+I));
maturepopmean = rlnorm(log(S+I), msd);
")
Read up on the Euler-multinomial distribution and what it is useful for. For example, here and here.
I've dispensed, for the moment, with your frequency- to density-dependent continuum. It's an interesting idea. I would recommend giving some serious thought to how you parameterize this continuum.
rprocSI <- Csnippet("
double rate[4], trans[4], rec;
double foi = Beta*I/(I+S);
rate[0] = foi; //force of infection
rate[1] = mu; // natural death
rate[2] = mu; // natural death
rate[3] = mud; // disease death
// Recruitment is randomly picked from the distribution provided in the data
rec = rlnorm(log(recruitment), rsd);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
reulermultinom(2,I,&rate[2],dt,&trans[2]);
//recruits can be disease positive or negative
S += nearbyint(rec*(1-pos)) - trans[0] - trans[1];
I += nearbyint(rec*pos) + trans[0] - trans[2] - trans[3];
")
Building the pomp object, note the choice of t0
.
meas %>%
pomp(
time="year", t0=2006,
rprocess=euler(rprocSI,delta.t=1/120),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
covar=covariate_table(covartable,times="year",order="constant"),
statenames=c("S", "I"),
paramnames=c("Beta", "pop", "pos", "mu", "mud", "sfrac"),
cdir=".", cfile="measSI"
) -> measSI
measSI %>%
as_tibble() %>%
gather(var,val,-year) %>%
ggplot(aes(x=year,y=val,group=var))+
geom_line()+
facet_wrap(~var,scales="free_y")
Let's examine a simulation.
measSI %>%
simulate(
nsim=1,
params=c(
Beta=0.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, sfrac=0.7
)
) %>%
as_tibble() %>%
gather(var,val,-year) %>%
ggplot(aes(x=year,y=val,group=var))+
geom_line()+
facet_wrap(~var,scales="free_y")
The particle filter reveals that these parameters are not very good.
measSI %>%
pfilter(
Np=1000,
params=c(
Beta=0.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, sfrac=0.7
)
) %>%
plot()
IF2 is usually pretty good at rapidly improving parameters. The following runs 50 IF2 iterations, exploring a 4-parameter subspace of the parameter space.
measSI %>%
mif2(
Nmif=50,
Np=1000,
params=c(
Beta=0.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, sfrac=0.7
),
partrans=parameter_trans(
log=c("Beta","mu","mud"),
logit="pos"
),
paramnames=c("Beta","mu","mud","pos"),
rw.sd=rw.sd(Beta=0.02,pos=ivp(0.02),mu=0.02,mud=0.02),
cooling.fraction.50=0.5
) -> mf1
mf1 %>%
traces() %>%
melt() %>%
ggplot(aes(x=iteration,y=value,group=variable))+
geom_line()+
facet_wrap(~variable,scales="free_y")
mf1 %>% coef()
What do simulations at these parameters look like?
mf1 %>%
simulate(
nsim=20,
format="data.frame",include.data=TRUE
) -> sims
sims %>%
select(year,rep=.id,maturepopmean,positive,samplesize) %>%
gather(var,val,-year,-rep) %>%
ggplot(aes(x=year,y=val,group=rep,color=rep=="data"))+
geom_line()+
facet_wrap(~var,scales="free_y",ncol=1)
The big issue that deserves to be addressed is whether the demographic stochasticity you assume (by using Euler-multinomial distributions with deterministic rates for the infection and death processes) is really sufficient to explain the data.
Thank you so much for helping me to get this running. I think it may make sense to try using running the model with and without demographic stochasticity. This is a dynamic system and many of the sources of natural mortality fluctuate across years (e.g., predation, resource availability); these leads me to believe that an SDE will ultimately work better. As a first step, I have coded a deterministic model (below). I have offset the timing of the population measurement to reflect that the survey is done in April. Recruitment is still estimated to occur in January.
In both the SDE and deterministic models, the estimates for mortality are much higher than I would expect. We believe that the natural mortality rate is between 0.2 and 0.5, however the models are estimating values around 4. I don't know if this may be because of the way that the model is adding the recruitment? The value I have for recruitment is the total recruitment we expect in a year, but we have limited it to the month of January. Is it possible that the total value of the recruitment is added in each delta.t
time step? This is my code for recruitment in the process model:
// Recruitment is randomly picked from the distribution provided in the data
double rec = rlnorm(log(recruitment), rsd);
I've also tried coding it as below, without any impact on the results:
double rec = rlnorm(log(recruitment*dt), rsd*dt);
Here is my full deterministic model:
library(tidyverse)
require(readr)
require(dplyr)
read_csv("PWS_disease_survey.csv") %>%
mutate(
msd = log(1+maturepopsd/maturepopmean)
) -> meas
meas$year<-meas$year+1/3
## add 0.33 to each year to reflect that the population survey occurs in April (note that recruitment still occurs in January)
## plot sample prevalence
meas %>%
ggplot(aes(
x=year,y=positive/samplesize,
ymin=qbeta(p=0.25,shape1=positive,shape2=negative+1),
ymax=qbeta(p=0.75,shape1=positive+1,shape2=negative)))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs(y="prevalence")+xlim(2006, 2018)+theme_bw()
##plot mature population
meas %>%
ggplot(aes(
x=year,y=maturepopmean,
ymin=qlnorm(p=0.25,meanlog=log(maturepopmean),sdlog=msd),
ymax=qlnorm(p=0.75,meanlog=log(maturepopmean),sdlog=msd)
))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs("mature population size")+xlim(2006, 2018)+theme_bw()
read_csv("PWS_recruitment.csv") %>%
filter(year>2005) %>%
mutate(
rsd=log(1+recruitmentsd/recruitment)
) %>%
select(-recruitmentsd) -> covartable
covs<-select(meas, c(samplesize, msd, year))
covartable<-merge(covartable, covs, all=TRUE)
meas %>%
select(-samplesize,-msd) -> meas
covartable %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qlnorm(p=0.25,meanlog=log(recruitment),sdlog=rsd),
ymax=qlnorm(p=0.75,meanlog=log(recruitment),sdlog=rsd)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+xlim(2006, 2018)+theme_bw()
## Now, you want your recruitment to happen in a pulse, once per year.
## In the following, this pulse arrives during the month of January.
covartable %>%
select(year) %>%
expand_grid(month=1:12) %>%
mutate(
time=year+(month-1)/12
) %>%
select(-month,-year) %>%
left_join(covartable,by=c("time"="year")) %>%
mutate(
recruitment=coalesce(recruitment,0),
rsd=coalesce(rsd,0),
samplesize=coalesce(samplesize,0),
msd=coalesce(msd,0)
) %>%
arrange(time)-> covartable
covartable$year<-covartable$time
covartable%>%
select(-time)-> covartable
covartable$year<-signif(covartable$year, digits = 8)
covartable<-covartable[!duplicated(covartable$year),]
## plot recruitment, it should be a single pulse each year
covartable %>%
ggplot(aes(x=year,y=recruitment))+
geom_step()+
geom_point()
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(pop*sfrac);// pop is initial population size
I = nearbyint(pop*(1-sfrac));
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive, samplesize, I/(S+I), give_log);
lik2 = dlnorm(maturepopmean, log(S+I), msd, give_log);
if (give_log)
lik = lik1+lik2;
else
lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/(S+I));
maturepopmean = rlnorm(log(S+I), msd);
")
rprocSI <- Csnippet("
// Recruitment is randomly picked from the distribution provided in the data
double rec = rlnorm(log(recruitment), rsd);
//transition rates
double foi = rbinom(S,1-exp(-Beta*(I)/(S+I)*dt));
//mortality rates
double MUS = rbinom(S,1-exp(-mu*dt));
double MUI = rbinom(I,1-exp(-mu*dt));
double MUD = rbinom(I,1-exp(-mud*dt));
//recruits can be disease positive or negative
S += nearbyint(rec*(1-pos)) - foi - MUS;
I += nearbyint(rec*pos) + foi - MUI - MUD;
")
meas %>%
pomp(
time="year", t0=2006,
rprocess=euler(rprocSI,delta.t=1/120),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
covar=covariate_table(covartable,times="year",order="constant"),
statenames=c("S", "I"),
paramnames=c("Beta", "pop", "pos", "mu", "mud", "sfrac"),
cdir=".", cfile="measSI"
) -> measSI
measSI %>%
as_tibble() %>%
gather(var,val,-year) %>%
ggplot(aes(x=year,y=val,group=var))+
geom_line()+
facet_wrap(~var,scales="free_y")
##Let's examine a simulation.
measSI %>%
simulate(
nsim=1,
params=c(
Beta=0.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, sfrac=0.7
)
) %>%
as_tibble() %>%
gather(var,val,-year) %>%
ggplot(aes(x=year,y=val,group=var))+
geom_line()+
facet_wrap(~var,scales="free_y")
measSI %>%
pfilter(
Np=1000,
params=c(
Beta=0.5, pop=559213100, pos=0.15, mu=0.25, mud=0.25, sfrac=0.7
)
) %>%
plot()
## IF2 is usually pretty good at rapidly improving parameters.
## The following runs 50 IF2 iterations, exploring a 4-parameter subspace of the parameter space.
measSI %>%
mif2(
Nmif=100,
Np=1000,
params=c(
Beta=0.3, pop=100000000, pos=0.15, mu=1.3, mud=0.4, sfrac=0.55
),
partrans=parameter_trans(
log=c("Beta","mu", "mud", "pop"),
logit=c("pos", "sfrac")
),
paramnames=c("Beta","mu", "mud","pop", "pos", "sfrac"),
rw.sd=rw.sd(Beta=0.02,pos=ivp(0.02),mu=0.02,mud=0.02, pop=0.02, sfrac=ivp(0.02)),
#rw.sd=rw.sd(Beta=0.02,pos=ivp(0.02),mud=0.02),
cooling.fraction.50=0.5
) -> mf1
mf1 %>%
traces() %>%
melt() %>%
ggplot(aes(x=iteration,y=value,group=variable))+
geom_line()+
facet_wrap(~variable,scales="free_y")
mf1 %>% coef()
mf1 %>%
simulate(
nsim=20,
format="data.frame",include.data=TRUE
) -> sims
Here are 20 simulations after 100 rounds of particle filtering. You can see how high the estimates for mu (natural death rate over a year) are.
sims %>%
select(year,rep=.id,maturepopmean,positive,samplesize) %>%
gather(var,val,-year,-rep) %>%
ggplot(aes(x=year,y=val,group=rep,color=rep=="data"))+
geom_line()+
facet_wrap(~var,scales="free_y",ncol=1)
I should add to this that I have tried fixing mu at a more plausible value and in those cases the estimates for the disease mortality are about an order of magniture higher than I would expect based on our experimental work.
I think your idea of using SDE makes a lot of sense in this case. For the reasons you mention, the stochasticity is not likely to be small. Rather than view this as removing demographic stochasticity, it's a question of adding environmental stochasticity. The two are certainly not mutually exclusive.
Probably the best way forward is to write SDE for the counting processes that track infections, S-deaths, and I-deaths, then to combine these into SDE for the S & I state variables. This is discussed in our short course.
A few other points:
rbinom
, which makes random draws from a binomial distribution. Indeed, it is not far from the Euler-multinomial implementation you had originally, with the added drawback that it is possible for some individuals to both die and become infected, or to die in two ways at once.The issue I mentioned just above, to do with the recruitment, is the following. If we situate our dynamical system in continuous time (as we do when we use euler
), then we need to describe the random number of recruits that arrive within an arbitrary time-interval. That is, if R(t1,t2) is the random number of recruits arriving between times t1 and t2, then we should insist that R(t1,t2) + R(t2,t3) = R(t1,t3) whenever t1 < t2 < t3. The log-normal distribution we've been assuming does not have this property.
I propose therefore, that we should assume R(t1,t2) ~ Gamma(r(t1),σ(t1)), where Gamma(r,σ) indicates the Gamma distribution with mean r and s.d. σ, the latter two quantities being covariates.
library(tidyverse)
require(readr)
require(dplyr)
read_csv("PWS_disease_survey.csv") %>%
mutate(
msd = log(1+maturepopsd/maturepopmean),
month=4
) -> meas
## plot sample prevalence
meas %>%
ggplot(aes(
x=year,y=positive/samplesize,
ymin=qbeta(p=0.25,shape1=positive,shape2=negative+1),
ymax=qbeta(p=0.75,shape1=positive+1,shape2=negative)))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs(y="prevalence")+xlim(2006, 2018)+theme_bw()
## plot mature population
meas %>%
ggplot(aes(
x=year,y=maturepopmean,
ymin=qlnorm(p=0.25,meanlog=log(maturepopmean),sdlog=msd),
ymax=qlnorm(p=0.75,meanlog=log(maturepopmean),sdlog=msd)
))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs("mature population size")+xlim(2006, 2018)+theme_bw()
read_csv("PWS_recruitment.csv") %>%
filter(year>2005) %>%
mutate(
rvar=recruitmentsd*recruitmentsd,
month=1
) %>%
select(-recruitmentsd) %>%
bind_rows(
meas %>%
select(year,month,samplesize,msd)
) %>%
arrange(year,month) -> covartable
meas %>%
select(year,month,positive,maturepopmean) -> meas
covartable %>%
select(year,recruitment,rvar) %>%
filter(!is.na(rvar)) %>%
mutate(
shape=recruitment*recruitment/rvar,
scale=rvar/recruitment
) %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qgamma(p=0.05,shape=shape,scale=scale),
ymax=qgamma(p=0.95,shape=shape,scale=scale)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+xlim(2006, 2018)+theme_bw()
covartable %>%
select(year) %>%
distinct() %>%
expand_grid(month=1:12) %>%
left_join(covartable,by=c("year","month")) %>%
mutate(
year=year+month/12,
recruitment=coalesce(recruitment,0),
rvar=coalesce(rvar,0),
samplesize=coalesce(samplesize,0),
msd=coalesce(msd,0)
) %>%
select(-month) %>%
arrange(year) -> covartable
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(pop*sfrac);// pop is initial population size
I = nearbyint(pop*(1-sfrac));
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive, samplesize, I/(S+I), give_log);
lik2 = dlnorm(maturepopmean, log(S+I), msd, give_log);
if (give_log)
lik = lik1+lik2;
else
lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/(S+I));
maturepopmean = rlnorm(log(S+I), msd);
")
rprocSI <- Csnippet("
// recruitment rate
double scale, shape, rec;
if (recruitment > 0) {
scale = rvar/recruitment;
shape = recruitment/scale*dt;
rec = rgamma(shape,scale);
} else {
rec = 0;
}
double rate[3], trans[3];
rate[0] = Beta*I/(S+I); // force of infection
rate[1] = mu;
rate[2] = mu+mud;
reulermultinom(2,S,&rate[0],rgammawn(1,dt),&trans[0]);
reulermultinom(1,I,&rate[2],rgammawn(1,dt),&trans[2]);
S += nearbyint(rec*(1-pos)) - trans[0] - trans[1];
I += nearbyint(rec*pos) + trans[0] - trans[2];
")
meas %>%
mutate(year=year+month/12) %>%
select(-month) %>%
pomp(
time="year", t0=2006+1/12,
rprocess=euler(rprocSI,delta.t=1/360),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
covar=covariate_table(covartable,times="year",order="constant"),
statenames=c("S", "I"),
paramnames=c("Beta", "pop", "pos", "mu", "mud", "sfrac")
) -> measSI
measSI %>%
mif2(
Nmif=100,
Np=1000,
params=c(
Beta=0.5, pop=148387450, pos=0.2, mu=0.25, mud=0.25, sfrac=0.7
),
partrans=parameter_trans(
log=c("Beta","mu","mud"),
logit=c("pos")
),
paramnames=c("Beta","mu", "mud","pop", "pos", "sfrac"),
rw.sd=rw.sd(Beta=0.02,pos=0.02,mu=0.02,mud=0.02),
cooling.fraction.50=0.5
) -> mf1
mf1 %>%
traces() %>%
melt() %>%
ggplot(aes(x=iteration,y=value,group=variable))+
geom_line()+
facet_wrap(~variable,scales="free_y")
mf1 %>% coef()
mf1 %>%
simulate(
nsim=20,
format="data.frame",include.data=TRUE
) -> sims
sims %>%
select(year,rep=.id,maturepopmean,positive,samplesize) %>%
gather(var,val,-year,-rep) %>%
ggplot(aes(x=year,y=val,group=rep,color=rep=="data"))+
geom_line()+
facet_wrap(~var,scales="free_y",ncol=1)
Thank you. That is working and the model is now giving realistic values for natural mortality. I see that you also included additional stochasticity in these lines:
reulermultinom(2,S,&rate[0],rgammawn(1,dt),&trans[0]);
reulermultinom(1,I,&rate[2],rgammawn(1,dt),&trans[2]);
Do you usually use sigma=1
in the rgammawn()
function?
No. Typically, there is an additional parameter (or two in this case) to estimate. I hesitated to go any farther.
In the measles example in part 5 of your SISMID course, you have included sigmaSE
as a parameter to estimate in your process model. Would you recommend taking this approach?
dw = rgammawn ( sigmaSE , dt );
In that case rgammawn()
is not included in the reulermultinom()
function, and W
is a variable that is tracked over time:
W += ( dw - dt )/ sigmaSE ; // standardized i . i . d . white noise
I have tried including extrademographic stochasticity in the natural mortality rate (mu). I think that this is probably the parameter that has the greatest variance.
rprocSI <- Csnippet("
// recruitment rate
double scale, shape, rec;
if (recruitment > 0) {
scale = rvar/recruitment;
shape = recruitment/scale*dt;
rec = rgamma(shape,scale);
} else {
rec = 0;
}
//extrademographic stochasticity in natural mortality mu
double dw = rgammawn (sigmaSE , dt );
double rate[3], trans[3];
rate[0] = Beta*I/(S+I); // force of infection
rate[1] = mu*dw/dt;
rate[2] = mu*dw/dt+mud;
reulermultinom(2,S,&rate[0],rgammawn(1,dt),&trans[0]);
reulermultinom(1,I,&rate[2],rgammawn(1,dt),&trans[2]);
S += nearbyint(rec*(1-pos)) - trans[0] - trans[1];
I += nearbyint(rec*pos) + trans[0] - trans[2];
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
")
There were a few mistakes in the code above (e.g., I forgot to take out the white noise from the reulermultinom()
function in the process model. With extrademographic stochasticity around 'mu', the simulations are a lot closer to the data, although the population size estimates have a huge range. (sigmaSE
is estimated at ~2 in the local search of parameter space that I conducted).
library(tidyverse)
require(readr)
require(dplyr)
setwd("C:/Users/mayag/Dropbox/Ichthyophonus/Ich mort model/pomp model")
read_csv("PWS_disease_survey.csv") %>%
mutate(
msd = log(1+maturepopsd/maturepopmean),
month=4
) -> meas
## plot sample prevalence
meas %>%
ggplot(aes(
x=year,y=positive/samplesize,
ymin=qbeta(p=0.25,shape1=positive,shape2=negative+1),
ymax=qbeta(p=0.75,shape1=positive+1,shape2=negative)))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs(y="prevalence")+xlim(2006, 2018)+theme_bw()
## plot mature population
meas %>%
ggplot(aes(
x=year,y=maturepopmean,
ymin=qlnorm(p=0.25,meanlog=log(maturepopmean),sdlog=msd),
ymax=qlnorm(p=0.75,meanlog=log(maturepopmean),sdlog=msd)
))+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+
labs("mature population size")+xlim(2006, 2018)+theme_bw()
read_csv("PWS_recruitment.csv") %>%
filter(year>2005) %>%
mutate(
rvar=recruitmentsd*recruitmentsd,
month=1
) %>%
select(-recruitmentsd) %>%
bind_rows(
meas %>%
select(year,month,samplesize,msd)
) %>%
arrange(year,month) -> covartable
meas %>%
select(year,month,positive,maturepopmean) -> meas
covartable %>%
select(year,recruitment,rvar) %>%
filter(!is.na(rvar)) %>%
mutate(
shape=recruitment*recruitment/rvar,
scale=rvar/recruitment
) %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qgamma(p=0.05,shape=shape,scale=scale),
ymax=qgamma(p=0.95,shape=shape,scale=scale)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+xlim(2006, 2018)+theme_bw()
covartable %>%
select(year) %>%
distinct() %>%
expand_grid(month=1:12) %>%
left_join(covartable,by=c("year","month")) %>%
mutate(
year=year+month/12,
recruitment=coalesce(recruitment,0),
rvar=coalesce(rvar,0),
samplesize=coalesce(samplesize,0),
msd=coalesce(msd,0)
) %>%
select(-month) %>%
arrange(year) -> covartable
#meas$year<-meas$year+1/3
library(pomp)
si_rinit <- Csnippet("
S = nearbyint(pop*sfrac);// pop is initial population size
I = nearbyint(pop*(1-sfrac));
")
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive, samplesize, I/(S+I), give_log);
lik2 = dlnorm(maturepopmean, log(S+I), msd, give_log);
if (give_log)
lik = lik1+lik2;
else
lik = lik1*lik2;
")
si_rmeas <- Csnippet("
positive = rbinom(samplesize,I/(S+I));
maturepopmean = rlnorm(log(S+I), msd);
")
rprocSI <- Csnippet("
// recruitment rate
double scale, shape, rec;
if (recruitment > 0) {
scale = rvar/recruitment;
shape = recruitment/scale*dt;
rec = rgamma(shape,scale);
} else {
rec = 0;
}
//extrademographic stochasticity in natural mortality mu
double dw = rgammawn (sigmaSE , dt );
double rate[3], trans[3];
rate[0] = Beta*I/(S+I); // force of infection
rate[1] = mu*dw/dt;
rate[2] = mu*dw/dt+mud;
reulermultinom(2,S,&rate[0], dt, &trans[0]);
reulermultinom(1,I,&rate[2], dt, &trans[2]);
S += nearbyint(rec*(1-pos)) - trans[0] - trans[1];
I += nearbyint(rec*pos) + trans[0] - trans[2];
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
")
meas %>%
mutate(year=year+month/12) %>%
select(-month) %>%
pomp(
time="year", t0=2006+1/12,
rprocess=euler(rprocSI,delta.t=1/360),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
covar=covariate_table(covartable,times="year",order="constant"),
statenames=c("S", "I", "W"),
paramnames=c("Beta", "pop", "pos", "mu", "mud", "sfrac", "sigmaSE"),
cdir=".", cfile="measSI"
) -> measSI
#local search of likelihood surface
measSI %>%
mif2(
Nmif=50,
Np=1000,
params=c(
Beta=0.326, pop=148387450, pos=0.127, mu=0.215, mud=0.203, sfrac=0.651, sigmaSE=1
),
partrans=parameter_trans(
log=c("Beta","mu","mud","sigmaSE","pop"),
logit=c("pos", "sfrac")
),
paramnames=c("Beta","mu", "mud","pop", "pos", "sfrac", "sigmaSE"),
rw.sd=rw.sd(Beta=0.02,pos=0.02,mu=0.02,mud=0.02, sfrac=0.02, pop=0.2, sigmaSE=0.2),
cooling.fraction.50=0.5
) -> mf1
mifs_local<-mf1
mf1 %>%
traces() %>%
melt() %>%
ggplot(aes(x=iteration,y=value,group=variable))+
geom_line()+
facet_wrap(~variable,scales="free_y")
mf1 %>% coef()
mf1 %>%
simulate(
nsim=20,
format="data.frame",include.data=TRUE
) -> sims
sims %>%
select(year,rep=.id,maturepopmean,positive,samplesize) %>%
gather(var,val,-year,-rep) %>%
ggplot(aes(x=year,y=val,group=rep,color=rep=="data"))+
geom_line()+
facet_wrap(~var,scales="free_y",ncol=1)
I have tried running my model in parallel, but I am getting a similar error message as that of Issue #96. I have specified cdir=".", cfile="measSI"
outside of the parallel call here:
meas %>%
mutate(year=year+month/12) %>%
select(-month) %>%
pomp(
time="year", t0=2006+1/12,
rprocess=euler(rprocSI,delta.t=1/360),
rinit=si_rinit,
rmeasure=si_rmeas,
dmeasure=si_dmeas,
covar=covariate_table(covartable,times="year",order="constant"),
statenames=c("S", "I", "W"),
paramnames=c("Beta", "pop", "pos", "mu", "mud", "sfrac", "sigmaSE"),
cdir=".", cfile="measSI"
) -> measSI
When I explore the local parameter space in parallel code (below), I get the following error message (note that all other relevant code is in the previous message).
foreach(i=1:4,.combine=c) %dopar% {
library(pomp)
library(tidyverse)
measSI %>%
mif2(
Np=2000, Nmif=50,
params=c(
Beta=1.75, pop=148387450, pos=0.06, mu=0.16, mud=0.20, sfrac=0.66, sigmaSE=1.4
),
partrans=parameter_trans(
log=c("Beta","mu","mud", "sigmaSE"),
logit=c("pos")
),
paramnames=c("Beta","mu", "mud","pop", "pos", "sfrac", "sigmaSE"),
rw.sd=rw.sd(Beta=0.02,pos=0.02,mu=0.02,mud=0.02, sfrac=0.02, pop=0.02, sigmaSE=.02),
cooling.fraction.50=0.5
)
} -> mifs_local
Error in { :
task 1 failed - "in 'mif2': unable to load shared object 'C:/Users/mgroner/AppData/Local/Temp/2/RtmpGGSd3v/10248/pomp_29419b0a804249a44deeaa2d34653453.dll':
LoadLibrary failure: The specified module could not be found.
The inclusion of the partrans
argument triggers a C snippet build. For some reason I do not yet understand, C snippet builds in parallel codes on Windows machines fail. To work around this, include the partrans
argument outside of the foreach
loop.
Not seeing any questions, I am going to close this issue now. Feel free to reopen if more discussion is warranted.
When I do a global search of parameter space, I am getting 'illegal' parameter values despite having transformed my parameters. Based on the error, I think that the issue may be related to the addition of white noise to the mortality rate; the problem doesn't occur if I run the model without the extrademographic stochasticity, though it is also possible that the model with the ED stochasticity is just revealing an existing problem. I've searched my code repeatedly and cannot find a source for the issue. Do you know what might be causing the problem? Many thanks for your help.
Here is the error message:
Error in { :
task 7 failed - "in 'mif2': 'dmeasure' with log=TRUE returns illegal value.
Log likelihood, data, states, and parameters are:
time: 2014.33
loglik: NaN
positive: 48
maturepopmean: 8.08965e+07
S: 0
I: 0
W: 13.0358
Beta: 0.208696
pop: 1.95103e+08
pos: 0.135647
mu: 0.368183
mud: 0.246813
sfrac: 0.656438
sigmaSE: 4.89772
And the code (which utilizes the data I previously sent you):
si_dmeas <- Csnippet("
double lik1, lik2;
lik1 = dbinom(positive, samplesize, I/(S+I), give_log);
lik2 = dlnorm(maturepopmean, log(S+I), msd, give_log);
if (give_log)
lik = lik1+lik2;
else
lik = lik1*lik2;
")
Looks like you have at least one realization for which S = I = 0. In this case, the population is extinct. That's incompatible with the data, so it deserves a likelihood of zero, but it gets a likelihood of NaN
, which causes the error. In other words, when I = S = 0, I/(S+I) is NaN
and dbinom(positive, samplesize, I/(S+I), give_log)
gives NaN
as well. If you trap for S+I=0 and return 0 for the likelihood (-Inf for the log likelihood), then the algorithm can proceed.
On the other hand, dlnorm(maturepopmean, log(S+I), msd, give_log)
gives 0 or -Inf for the likelihood and log likelihood, respectively, so that's OK.
Hello Aaron. I am converting my simple SI model to an age-structured model which focuses on ages three through six plus (herring recruit to the mature population at age three, and I am lumping individuals 6 and older). I am getting NaN
for the populations in my simulations. I believe this is due to the way I have modeled transitions between age classes. I would like to structure these transitions so that they happen in yearly pulses, which coincides with the yearly recruitment pulse of new age threes entering the mature population. As this is not a stochastic process (e.g., all individuals should age one year each year), I have written the code in my process model so that the transition rate is 1/dt
when recruitment is >0 and 0 otherwise. I'm not sure if this might be causing my populations to go below 0 in my simulations? Do you have any suggestions for how I might code this differently? I will email you the relevant (confidential) dataset. Many thanks for your help.
library(tidyverse)
require(readr)
require(dplyr)
library (pomp)
read_csv("PWS for pomp with age structure_recruit at three.csv") %>%
mutate(
month=4,
) -> meas
## plot mature population
meas %>% tidyr::gather("id", "value", popmean_three:popmean_six) ->meas1
meas1$id<-factor(as.factor(meas1$id), levels=c("popmean_three", "popmean_four","popmean_five", "popmean_six"))
ggplot(meas1, aes(year, value, color=id))+geom_point()+geom_line()+ scale_color_manual(values=c("red", "orange", "light green", "dark green"))+ylab("herring population")
read_csv("PWS_recruitment_allthrees.csv") %>%
filter(year>2005) %>%
mutate(
rvar=recruitmentsd*recruitmentsd,
month=1
)%>%
select(-recruitmentsd) %>%
bind_rows(
meas %>%
select(year,month,samplesize_three,samplesize_four, samplesize_five, samplesize_six, logpopsd_three, logpopsd_four, logpopsd_five, logpopsd_six)
) %>%
arrange(year,month) -> covartable
meas %>%
select(year,month,positive_three, positive_four, positive_five, positive_six, logpopmean_three, logpopmean_four, logpopmean_five, logpopmean_six) -> meas
covartable %>%
select(year,recruitment,rvar) %>%
filter(!is.na(rvar)) %>%
mutate(
shape=recruitment*recruitment/rvar,
scale=rvar/recruitment
) %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qgamma(p=0.05,shape=shape,scale=scale),
ymax=qgamma(p=0.95,shape=shape,scale=scale)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+xlim(2006, 2018)+theme_bw()
covartable %>%
select(year) %>%
distinct() %>%
expand_grid(month=1:12) %>%
left_join(covartable,by=c("year","month")) %>%
mutate(
year=year+month/12,
recruitment=coalesce(recruitment,0),
rvar=coalesce(rvar,0),
samplesize_three=coalesce(samplesize_three,0),
samplesize_four=coalesce(samplesize_four,0),
samplesize_five=coalesce(samplesize_five,0),
samplesize_six=coalesce(samplesize_six,0),
logpopsd_three=coalesce(logpopsd_three,0),
logpopsd_four=coalesce(logpopsd_four,0),
logpopsd_five=coalesce(logpopsd_five,0),
logpopsd_six=coalesce(logpopsd_six,0)
) %>%
select(-month) %>%
arrange(year) -> covartable
covartable<-covartable[!is.na(covartable$year),]
meas<-meas[!is.na(meas$year),]
library(pomp)
si_rinit <- Csnippet("
S3 = nearbyint(pop3*sfrac3);// pop3 is initial population size for age 3, sfrac3, is proportion that are infected at T0
I3 = nearbyint(pop3*(1-sfrac3));
S4 = nearbyint(pop4*sfrac4);// pop4 is initial population size for age 4
I4 = nearbyint(pop4*(1-sfrac4));
S5 = nearbyint(pop5*sfrac5);// pop5 is initial population size for age 5
I5 = nearbyint(pop5*(1-sfrac5));
S6 = nearbyint(pop6*sfrac6);// pop6 is initial population size for age 6
I6 = nearbyint(pop6*(1-sfrac6));
W = 0;
")
si_dmeas <- Csnippet("
double lik1, lik2, lik3, lik4, lik5, lik6, lik7, lik8, Prev, Prev3, Prev4, Prev5, Prev6;
if (S3+I3+S4+I4+S5+I5+S6+I6==0)
Prev=0;
else
Prev = (I3+I4+I5+I6)/(S3+S4+S5+S6+I3+I4+I5+I6);
if (S3+I3==0)
Prev3=0;
else
Prev3 = (I3)/(S3+I3);
if (S4+I4==0)
Prev4=0;
else
Prev4 = (I4)/(S4+I4);
if (S5+I5==0)
Prev5=0;
else
Prev5 = (I5)/(S5+I5);
if (S6+I6==0)
Prev6=0;
else
Prev6 = (I6)/(S6+I6);
lik1 = dbinom(positive_three, samplesize_three, Prev3, give_log);
lik2 = dbinom(positive_four, samplesize_four, Prev4, give_log);
lik3 = dbinom(positive_five, samplesize_five, Prev5, give_log);
lik4 = dbinom(positive_six, samplesize_six, Prev6, give_log);
lik5 = dlnorm(logpopmean_three, log(S3+I3), logpopsd_three, give_log);
lik6 = dlnorm(logpopmean_four, log(S4+I4), logpopsd_four, give_log);
lik7 = dlnorm(logpopmean_five, log(S5+I5), logpopsd_five, give_log);
lik8 = dlnorm(logpopmean_six, log(S6+I6), logpopsd_six, give_log);
if (give_log)
lik = lik1+lik2+lik3+lik4+lik5+lik6+lik7+lik8;
else
lik = lik1*lik2*lik3*lik4*lik5*lik6*lik7*lik8;
")
si_rmeas <- Csnippet("
positive_three = rbinom(samplesize_three,I3/(S3+I3));
positive_four = rbinom(samplesize_four,I4/(S4+I4));
positive_five = rbinom(samplesize_five,I5/(S5+I5));
positive_six = rbinom(samplesize_six,I6/(S6+I6));
logpopmean_three = log(rlnorm(log(S3+I3), logpopsd_three));
logpopmean_four = log(rlnorm(log(S4+I4), logpopsd_four));
logpopmean_five = log(rlnorm(log(S5+I5), logpopsd_five));
logpopmean_six = log(rlnorm(log(S6+6), logpopsd_six));
")
rprocSI <- Csnippet("
// recruitment rate and timing of transition to next age group
double scale, shape, rec, Delta;
if (recruitment > 0) {
scale = rvar/recruitment;
shape = recruitment/scale*dt;
rec = rgamma(shape,scale);
Delta = 1/dt;
} else {
rec = 0;
Delta = 0;
}
double I, S, N;
I = I3+I4+I5+I6;
S = S3+S4+S5+S6;
N = I + S;
//extrademographic stochasticity in natural mortality mu
double dw = rgammawn (sigmaSE, dt);
double rate[3], trans[3];
rate[0] = Beta*I/N; // force of infection
rate[1] = mu*dw/dt;
rate[2] = mu*dw/dt+mud;
rate[3] = Beta*I/N;
rate[4] = mu*dw/dt;
rate[5] = mu*dw/dt+mud;
rate[6] = Beta*I/N;
rate[7] = mu*dw/dt;
rate[8] = mu*dw/dt+mud;
rate[9] = Beta*I/N;
rate[10] = mu*dw/dt;
rate[11] = mu*dw/dt+mud;
reulermultinom(2,S3,&rate[0], dt, &trans[0]);
reulermultinom(1,I3,&rate[2], dt, &trans[2]);
reulermultinom(2,S4,&rate[3], dt, &trans[3]);
reulermultinom(1,I4,&rate[5], dt, &trans[5]);
reulermultinom(2,S5,&rate[6], dt, &trans[6]);
reulermultinom(1,I5,&rate[8], dt, &trans[8]);
reulermultinom(2,S6,&rate[9], dt, &trans[9]);
reulermultinom(1,I6,&rate[11], dt, &trans[11]);
S3 += nearbyint(rec*(1-pos3)) - trans[0] - trans[1] - nearbyint(Delta*S3);
I3 += nearbyint(rec*pos3) + trans[0] - trans[2] - nearbyint(Delta*I3);
S4 += nearbyint(Delta*S3) - trans[3] - trans[4] - nearbyint(Delta*S4);
I4 += nearbyint(Delta*I3) + trans[3] - trans[5] - nearbyint(Delta*I4);
S5 += nearbyint(Delta*S4) - trans[6] - trans[7] - nearbyint(Delta*S5);
I5 += nearbyint(Delta*I4) + trans[6] - trans[8] - nearbyint(Delta*I5);
S6 += nearbyint(Delta*S5) - trans[9] - trans[10];
I6 += nearbyint(Delta*I5) + trans[9] - trans[11];
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
")
Hi @mayagroner, looking over the code, the only thing that jumps out is the fact that you have not allocated enough space for the local arrays rate
and trans
. It looks like you need 12 elements for each: double rate[12], trans[12];
. I could look into it more carefully, but would need parameter values to simulate and filter....
Thank you @kingaa. I made the change you suggested but I am still having the same issue. It would be helpful if you could take a more careful look. I've pasted the updated code along with the simulation starting values.
library(tidyverse)
require(readr)
require(dplyr)
library (pomp)
read_csv("PWS for pomp with age structure_recruit at three.csv") %>%
mutate(
month=4,
) -> meas
## plot mature population
meas %>% tidyr::gather("id", "value", popmean_three:popmean_six) ->meas1
meas1$id<-factor(as.factor(meas1$id), levels=c("popmean_three", "popmean_four","popmean_five", "popmean_six"))
ggplot(meas1, aes(year, value, color=id))+geom_point()+geom_line()+ scale_color_manual(values=c("red", "orange", "light green", "dark green"))+ylab("herring population")
read_csv("PWS_recruitment_allthrees.csv") %>%
filter(year>2005) %>%
mutate(
rvar=recruitmentsd*recruitmentsd,
month=1
)%>%
select(-recruitmentsd) %>%
bind_rows(
meas %>%
select(year,month,samplesize_three,samplesize_four, samplesize_five, samplesize_six, logpopsd_three, logpopsd_four, logpopsd_five, logpopsd_six)
) %>%
arrange(year,month) -> covartable
meas %>%
select(year,month,positive_three, positive_four, positive_five, positive_six, logpopmean_three, logpopmean_four, logpopmean_five, logpopmean_six) -> meas
covartable %>%
select(year,recruitment,rvar) %>%
filter(!is.na(rvar)) %>%
mutate(
shape=recruitment*recruitment/rvar,
scale=rvar/recruitment
) %>%
ggplot(
aes(
x=year,
y=recruitment,
ymin=qgamma(p=0.05,shape=shape,scale=scale),
ymax=qgamma(p=0.95,shape=shape,scale=scale)
)
)+
geom_ribbon(fill=gray(0.6))+
geom_line()+
geom_point()+xlim(2006, 2018)+theme_bw()
covartable %>%
select(year) %>%
distinct() %>%
expand_grid(month=1:12) %>%
left_join(covartable,by=c("year","month")) %>%
mutate(
year=year+month/12,
recruitment=coalesce(recruitment,0),
rvar=coalesce(rvar,0),
samplesize_three=coalesce(samplesize_three,0),
samplesize_four=coalesce(samplesize_four,0),
samplesize_five=coalesce(samplesize_five,0),
samplesize_six=coalesce(samplesize_six,0),
logpopsd_three=coalesce(logpopsd_three,0),
logpopsd_four=coalesce(logpopsd_four,0),
logpopsd_five=coalesce(logpopsd_five,0),
logpopsd_six=coalesce(logpopsd_six,0)
) %>%
select(-month) %>%
arrange(year) -> covartable
covartable<-covartable[!is.na(covartable$year),]
meas<-meas[!is.na(meas$year),]
library(pomp)
si_rinit <- Csnippet("
S3 = nearbyint(pop3*sfrac3);// pop3 is initial population size for age 3, sfrac3, is proportion that are infected at T0
I3 = nearbyint(pop3*(1-sfrac3));
S4 = nearbyint(pop4*sfrac4);// pop4 is initial population size for age 4
I4 = nearbyint(pop4*(1-sfrac4));
S5 = nearbyint(pop5*sfrac5);// pop5 is initial population size for age 5
I5 = nearbyint(pop5*(1-sfrac5));
S6 = nearbyint(pop6*sfrac6);// pop6 is initial population size for age 6
I6 = nearbyint(pop6*(1-sfrac6));
W = 0;
")
si_dmeas <- Csnippet("
double lik1, lik2, lik3, lik4, lik5, lik6, lik7, lik8, Prev, Prev3, Prev4, Prev5, Prev6;
if (S3+I3+S4+I4+S5+I5+S6+I6==0)
Prev=0;
else
Prev = (I3+I4+I5+I6)/(S3+S4+S5+S6+I3+I4+I5+I6);
if (S3+I3==0)
Prev3=0;
else
Prev3 = (I3)/(S3+I3);
if (S4+I4==0)
Prev4=0;
else
Prev4 = (I4)/(S4+I4);
if (S5+I5==0)
Prev5=0;
else
Prev5 = (I5)/(S5+I5);
if (S6+I6==0)
Prev6=0;
else
Prev6 = (I6)/(S6+I6);
lik1 = dbinom(positive_three, samplesize_three, Prev3, give_log);
lik2 = dbinom(positive_four, samplesize_four, Prev4, give_log);
lik3 = dbinom(positive_five, samplesize_five, Prev5, give_log);
lik4 = dbinom(positive_six, samplesize_six, Prev6, give_log);
lik5 = dlnorm(logpopmean_three, log(S3+I3), logpopsd_three, give_log);
lik6 = dlnorm(logpopmean_four, log(S4+I4), logpopsd_four, give_log);
lik7 = dlnorm(logpopmean_five, log(S5+I5), logpopsd_five, give_log);
lik8 = dlnorm(logpopmean_six, log(S6+I6), logpopsd_six, give_log);
if (give_log)
lik = lik1+lik2+lik3+lik4+lik5+lik6+lik7+lik8;
else
lik = lik1*lik2*lik3*lik4*lik5*lik6*lik7*lik8;
")
si_rmeas <- Csnippet("
positive_three = rbinom(samplesize_three,I3/(S3+I3));
positive_four = rbinom(samplesize_four,I4/(S4+I4));
positive_five = rbinom(samplesize_five,I5/(S5+I5));
positive_six = rbinom(samplesize_six,I6/(S6+I6));
logpopmean_three = log(rlnorm(log(S3+I3), logpopsd_three));
logpopmean_four = log(rlnorm(log(S4+I4), logpopsd_four));
logpopmean_five = log(rlnorm(log(S5+I5), logpopsd_five));
logpopmean_six = log(rlnorm(log(S6+6), logpopsd_six));
")
rprocSI <- Csnippet("
// recruitment rate and timing of transition to next age group
double scale, shape, rec, Delta;
if (recruitment > 0) {
scale = rvar/recruitment;
shape = recruitment/scale*dt;
rec = rgamma(shape,scale);
Delta = 1/dt;
} else {
rec = 0;
Delta = 0;
}
double I, S, N;
I = I3+I4+I5+I6;
S = S3+S4+S5+S6;
N = I + S;
//extrademographic stochasticity in natural mortality mu
double dw = rgammawn (sigmaSE, dt);
double rate[12], trans[12];
rate[0] = Beta*I/N; // force of infection
rate[1] = mu*dw/dt;
rate[2] = mu*dw/dt+mud;
rate[3] = Beta*I/N;
rate[4] = mu*dw/dt;
rate[5] = mu*dw/dt+mud;
rate[6] = Beta*I/N;
rate[7] = mu*dw/dt;
rate[8] = mu*dw/dt+mud;
rate[9] = Beta*I/N;
rate[10] = mu*dw/dt;
rate[11] = mu*dw/dt+mud;
reulermultinom(2,S3,&rate[0], dt, &trans[0]);
reulermultinom(1,I3,&rate[2], dt, &trans[2]);
reulermultinom(2,S4,&rate[3], dt, &trans[3]);
reulermultinom(1,I4,&rate[5], dt, &trans[5]);
reulermultinom(2,S5,&rate[6], dt, &trans[6]);
reulermultinom(1,I5,&rate[8], dt, &trans[8]);
reulermultinom(2,S6,&rate[9], dt, &trans[9]);
reulermultinom(1,I6,&rate[11], dt, &trans[11]);
S3 += nearbyint(rec*(1-pos3)) - trans[0] - trans[1] - nearbyint(Delta*S3);
I3 += nearbyint(rec*pos3) + trans[0] - trans[2] - nearbyint(Delta*I3);
S4 += nearbyint(Delta*S3) - trans[3] - trans[4] - nearbyint(Delta*S4);
I4 += nearbyint(Delta*I3) + trans[3] - trans[5] - nearbyint(Delta*I4);
S5 += nearbyint(Delta*S4) - trans[6] - trans[7] - nearbyint(Delta*S5);
I5 += nearbyint(Delta*I4) + trans[6] - trans[8] - nearbyint(Delta*I5);
S6 += nearbyint(Delta*S5) - trans[9] - trans[10];
I6 += nearbyint(Delta*I5) + trans[9] - trans[11];
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
")
measSI %>%
simulate(
params=c(Beta=.2, pop3=33406000, pop4=19178000, pop5=28401000, pop6=109623000, pos3=.38, mu=.4, mud=.5, sfrac3=.62, sfrac4=.62, sfrac5=.62, sfrac6=.56, sigmaSE=1),
nsim=20,format="data.frame",include.data=TRUE
) -> sims
Aren't the units of your balance equations incorrect?
S3 += nearbyint(rec*(1-pos3)) - trans[0] - trans[1] - nearbyint(Delta*S3);
I3 += nearbyint(rec*pos3) + trans[0] - trans[2] - nearbyint(Delta*I3);
S4 += nearbyint(Delta*S3) - trans[3] - trans[4] - nearbyint(Delta*S4);
I4 += nearbyint(Delta*I3) + trans[3] - trans[5] - nearbyint(Delta*I4);
S5 += nearbyint(Delta*S4) - trans[6] - trans[7] - nearbyint(Delta*S5);
I5 += nearbyint(Delta*I4) + trans[6] - trans[8] - nearbyint(Delta*I5);
S6 += nearbyint(Delta*S5) - trans[9] - trans[10];
I6 += nearbyint(Delta*I5) + trans[9] - trans[11];
Delta
has units of 1/time, so the units in the various terms are inconsistent. I think you are trying to move everyone from one age class to another at the specified time. So Delta
should be either 1 or 0 depending on the clock time.
Also, note that after the first of the above equations is executed, the value of S3
has changed, so that renders the third equation incorrect. Likewise for the second and fourth, etc.
Also, your lognormal measurement model is incorrectly specified.
Also, you have non-integer sample sizes in your covariate table.
Am closing this issue. Feel free to reopen if more discussion is warranted.
I’m hoping that you may be able to provide some advice about how to set up my measurement model. My datasets do not include incidence data as shown in most of the papers I have looked. Instead, I have time series data for 3 randomly selected sets of 60 fish that were tested for infection presence/absence. We also have estimates of the total population size at each timepoint. The disease is very slow progressing, so samples are taken yearly over 14 years. I am stuck on how to code the measurement model. If you have any advice and/or could point me towards a paper that has a similar data structure, that would be helpful. Many thanks!