kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
112 stars 27 forks source link

How to fit a Pomp model to a cohort study data? #99

Closed NafisNr closed 4 years ago

NafisNr commented 4 years ago

I'm trying to fit a deterministic pomp model to a cohort study data, where individuals were kept track of on a monthly scale.

Plot below shows the number of individuals tested on a monthly scale and also number infected with an enteric pathogen. image

I am wondering if it is possible to fit a pomp model to such data by defining weight for each point in the measurement model?

I appreciate your advice. Here is my R code:

head(data) time Infected 2010.250 7 2010.333 3 2010.417 3 2010.500 9 2010.583 9 2010.667 9

head(covar) ## Infec values are same as Infected values from data time pop Infec 2010.250 24 7 2010.333 29 3 2010.417 41 3 2010.500 43 9 2010.583 55 9 2010.667 74 9

rinit <- Csnippet(" double m = N/(S_0+I_0+R_0); S = nearbyint(m S_0); I = nearbyint(m I_0); R = nearbyint(m * R_0); ")

dmeas <- Csnippet(" lik = dbinom(Infected , I , Infec/pop, give_log); ")

rmeas <- Csnippet(" Infected = rbinom(I , Infec/pop); ")

toEst <- Csnippet(" T_mu = log(mu); T_iota = log(iota); T_gamma = log(gamma); T_eta = logit(eta); T_R0= log(R0); T_S_0 = logit(S_0); T_I_0 = logit(I_0); T_R_0 = logit(R_0); ")

fromEst <- Csnippet(" mu = exp(T_mu); iota = exp(T_iota); gamma = exp(T_gamma); eta = expit(T_eta); R0 = exp(T_R0); S_0 = expit(T_S_0); I_0 = expit(T_I_0); R_0 = expit(T_R_0); ")

--------------- A continuous-time deterministic skeleton ---------------

skel <- Csnippet(" double beta, lambda;
beta = R0 (mu+gamma); lambda = (betaI/N)+iota;

               // N is fixed to 100,000 which the population of the catchment area; mu== 0.014                                  

              DS = mu*N + eta*R - lambda*S - mu*S;
              DI = lambda*S - gamma*I - mu*I; 
              DR = gamma*I - eta*R - mu*R; 
             ")

----pomp-construction---------------------------------------------------

data %>% pomp(t0=with(data[-1,],2*time[1]-time[2]), time="time", partrans=parameter_trans(toEst,fromEst), dmeasure=dmeas, rmeasure=rmeas, rinit=rinit, covar=covariate_table(covar,times="time"), skeleton=vectorfield(skel), statenames=c("S","I","R"), paramnames=c("gamma","R0","eta","S_0","I_0","R_0","iota","mu","N") ) -> Mln

kingaa commented 4 years ago

It is eminently possible to fit a model of the kind you allude to.

However, I'm not sure what you are looking for. What do you mean by "defining weight for each point"?

Also, I don't understand why you specify both 'rprocess' and 'skeleton'. Since your process model is deterministic, only 'skeleton' is appropriate.

NafisNr commented 4 years ago

Thanks very much Dr. King for the reply. Sorry for not being clear. By weight, I mean having time varying probability in the dmeasure and rmeasure functions. In this example, I used the proportion of infected cases as the probability of success in dbinom and rbinom. Since cohort study outcome is not a representative of the population of the study area, I am wondering how I can contribute the covariates to the model? I appreciate any hint/advice.

As you mentioned, the "rprocess" for this example is redundant since it's a deterministic model. I forgot to remove it here from the code. Sorry about that. I edited the code.

kingaa commented 4 years ago

dmeasure and rmeasure can depend on time and on covariates. Have you tried simply referencing the covariates in these C snippets?

NafisNr commented 4 years ago

Thanks for the hint.
Yes, my covar data, called "pop" here, is the number of individuals tested per time. I used it in the dmeasure and rmeasure. I have also tried the followings as an alternative option which gives me a good fit. I defined the probability of success as the ratio of number tested per time over the maximum number tested multiplied by reporting probability (eps1):

dmeas3 <- Csnippet(" lik = dbinom(Infected,nearbyint(I),(pop/max(pop))*eps1,give_log); ")

rmeas3 <- Csnippet(" Infected = rbinom(nearbyint(I),(pop/max(pop))*eps1);
")

I am wondering if using such probabilities are correct? Thanks again.

kingaa commented 4 years ago

I don't know what you mean by "correct". Perhaps it is better to ask to what model these code snippets correspond. Your meas3 snippets correspond to a model in which every infected person, independently, has a chance eps1*pop/max(pop) of being recorded in the data as infected. Is that what you want to assume?

More generally, you really just haven't been clear in explaining (1) what your model is, (2) what your question is, and (3) what your data are.

NafisNr commented 4 years ago

My apologies. Hopefully it is clear this time.

I'm trying to fit an SIRS model to Campylobacter to understand the dynamics of this enteric pathogen.

Data is from a cohort study in Mirpur area, Bangladesh were in total, 265 children's stool samples were collected on a monthly scale from when they were 7 days old until they turned 2 years old. This study was conducted from 2010 to 2014. The plot below shows the number of individuals tested per time as well as number of positive Campylobacter cases. image

My question is how to fit a pomp model to a cohort study data, specifically in terms of covariates? I have tried using cohort population (which is number of individuals tested per time, yellow line) as covariates in my model, but the model fit is bad. I also tried using annual Dhaka population, the city Mirpur is located in, and using smoothing spline to have a monthly scale population, but again the model fit is bad. The last thing I tried is fixing the population to Mirpur area population, which I have added the code above.

Any advice is appreciated.

kingaa commented 4 years ago

What is your scientific question? Are you trying to learn about the circulation of the pathogen in the general population? Are you wanting to know the trajectories of the prevalence or perhaps to infer the relationship between prevalence and force of infection?
Then there are questions about the relationship between the data and the question: What is the relationship between the cohort of children and the circulation of the pathogen? Are the children seen as a representative sample of the population, as a set of sentinels, as a population in themselves, or something else?

NafisNr commented 4 years ago

My scientific question is how malnutrition affects the dynamics of the pathogen?

In this study, the nutrition status of children was also recorded on monthly scale. As the first step, I was trying to fit the model to only the enteric pathogen data without accounting for the nutrition status of children.

The cohort study was conducted in resource-constrained areas (8 countries across 3 continents) and included both urban and rural communities within populations known to have high rates of malnutrition and enteric infections. Each site defined a catchment area where it was estimated that >200 infants (the target number of children to be enrolled per site) would be born within the enrollment period lasting 2 years. I chose one of the sites for my work.

Thank you for your time.

kingaa commented 4 years ago

Interesting! So, I am guessing that you want to understand what the risk of infection was among these children, so that you can, ultimately, see whether nutritional status affected their risk, no?

If this is correct, then it is not clear that it makes sense to think about the force of infection as a dynamic process. In particular, if transmission is driven by infections within a large population, and these children are only a small part of that population, the data from the children alone will not contain very much information about the transmission dynamics.

NafisNr commented 4 years ago

Thank you very much for your reply. Yes, that's exactly what I want to understand. Campylobacter main mode of transmission is through consuming contaminated food or water, or raw undercooked poultry. However as you mentioned, since the population of catchment area for this site is about 100,000, data from 200 children may not give me enough information.

Thanks again for your time and your help.

kingaa commented 4 years ago

Yes, so it seems that pomp is more than you really want. Why not attempt to estimate the force of infection (either as a constant, or time-varying), then look at the differential risk of infection with nutrition status. Perhaps a Cox proportional hazards model would be what you want. In any case, it seems pomp is overkill in this situation.