kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
114 stars 27 forks source link

The simulations of my POMP are reasonable, but the pfilter warned filtering failures occurred. #113

Closed MathGIS closed 4 years ago

MathGIS commented 4 years ago

Recently, I build a POMP object to model the spread of diseases. The simulations seem reasonable with assumed parameters. DF_sim01 DF_sim02 However, when I computed the loglik of the assumed parameter, and hope to fit the model to data to get MLE, I can only get -Inf. The diagnositic plot is as follows: Fig_pfilter_diag This confused me, I think the assumed parameter values can give reasonable outputs, and therefore I guess the loglik would be around the maximum of loglik. But the loglik is -Inf by pfilter, and I cannot proceed on the other calculation. I don't know why is it. Would you please give me some suggestions?

Here are the codes I used.

## ---- Model - rprocess ----
rproc <- Csnippet(
  "
  double lambda_h, lambda_v, sim_births_h, sim_births_v, Nh, Nv, mu_v2;
  double rate[15], trans[15];

  // Tota human and vector population
  Nh = Sh + Eh + Ih + Ah + Rh;

  if(t<267)
    Nv = Sv + Ev + Iv;
  else
    Nv = nvdcr * (Sv + Ev + Iv);

  // Force of infection from vector to human
  lambda_v = bite * pv2h * Iv / Nh;
  // Force of infection from human to vector
  lambda_h = bite * ph2v * (Ih + Ah) / Nh;

  if(t<267)
    mu_v2 = mu_v;
  else
    mu_v2 = erad * mu_v;

  // Rates
  rate[0] = lambda_v;  // force of infection from vector to human
  rate[1] = mu_h;      // Human death rate
  rate[2] = theta * delta_h;       // theta * 1/IIP(theta is the proportion
                                   // of sympotomatic )
  rate[3] = (1 - theta) * delta_h;       // (1 - theta) * 1/IIP (theta is the
                                         // proportion of asymptomatic )
  rate[4] = mu_h;       // Human death rate
  rate[5] = gamma_h ;   // 1/Infectious period of human
  rate[6] = mu_h;       // Human death rate
  rate[7] = gamma_h ;   // 1/Infectious period of human  ;
  rate[8] = mu_h;       // Human death rate
  rate[9] = mu_h;       // Human death rate
  rate[10] = lambda_h;  // force of infection from human to vector
  rate[11] = mu_v2;      // Vector death rate
  rate[12] = delta_v;   // 1/EIP
  rate[13] = mu_v2;      // Vector death rate
  rate[14] = mu_v2;      // Vector death rate

  // Poisson births
  sim_births_h = rpois(Nh * mu_h * dt);       // Human
  sim_births_v = rpois(Nv * mu_v * dt);       // Vector

  // Transitions between classes
  reulermultinom(2,Sh,&rate[0],dt,&trans[0]);
  reulermultinom(3,Eh,&rate[2],dt,&trans[2]);
  reulermultinom(2,Ih,&rate[5],dt,&trans[5]);
  reulermultinom(2,Ah,&rate[7],dt,&trans[7]);
  reulermultinom(1,Rh,&rate[9],dt,&trans[9]);

  reulermultinom(2,Sv,&rate[10],dt,&trans[10]);
  reulermultinom(2,Ev,&rate[12],dt,&trans[12]);
  reulermultinom(1,Iv,&rate[14],dt,&trans[14]);

  // Equations
  Sh += sim_births_h - trans[0] - trans[1];
  Eh += trans[0] - trans[2] - trans[3] - trans[4];
  Ih += trans[2] - trans[5] - trans[6];
  Ah += trans[3] - trans[7] - trans[8];
  Rh += trans[5] + trans[7] - trans[9];

  Sv += sim_births_v - trans[10] - trans[11];
  Ev += trans[10] - trans[12] - trans[13];
  Iv += trans[12] - trans[14];

  C += trans[2];           // true incidence
  "
)

#-- Init variables --#
initlz <- Csnippet("
  Sh = 12838900;
  Eh = 0;
  Ih = 5;
  Ah = 0;
  Rh = 0;

  Sv = nearbyint(1.5 * 12838900);
  Ev = 0;
  Iv = 0;

  C = 0;
")
## -------------------------

## ---- Observation process ----
robs <- Csnippet("
 cases = rpois(rho * C);
 ")

dobs <- Csnippet("
 double lI = dpois(nearbyint(cases),rho * C,0);
 lik = (give_log) ? log(lI) : lI;
 ")

## ----------------------------

## ---- klc: creat pomp object and simulation of df ----
df <- pomp(
  data = data,
  times = "DateNo",
  t0 = with(data, DateNo[1]),
  rinit = initlz,
  rprocess = euler(rproc, delta.t = 1),
  rmeasure = robs,
  dmeasure = dobs,
  obsnames = "cases",
  statenames = c("Sh", "Eh", "Ih", "Ah", "Rh", "Sv", "Ev", "Iv", "C"),
  accumvars = "C",
  paramnames = c(
    "bite",
    "ph2v",
    "pv2h",
    "delta_h",
    "theta",
    "gamma_h",
    "mu_h",
    "delta_v",
    "mu_v",
    "rho",
    "erad",
    "nvdcr"
  ),
  partrans = parameter_trans(
    log = c("bite", "delta_h", "gamma_h", "mu_h", "delta_v", "mu_v", "erad"),
    logit = c("theta", "ph2v", "pv2h", "rho", "nvdcr")
  )
)
kingaa commented 4 years ago

@MathGIS: Thanks for doing such a nice job of presenting your issue. Having all the code and the plots was really useful to help me see the issue right away. I'm aware that it's a nontrivial amount of work to do so. You didn't provide the data (simulated data would've worked), so I wasn't able to reproduce the issue. I modified your codes to try and simulate the data myself, but you didn't provide parameter values, so I couldn't do that either. So, I'm taking something of a shot in the dark in the following.

I agree that it does appear your model has sufficient variability, and at the parameters you've used for the simulations, a reasonable trend. This suggests that perhaps the problem is a superficial one. Looking over the codes, only one thing jumps out at me this morning, midway through my first cup of coffee: the dmeasure code relies on the log of dpois(log=FALSE) being as accurate as dpois(log=TRUE) for the parameter ranges in question. Offhand, I don't know if that is true. On the other hand, whether that would matter for particle filtering depends on what version of pomp you're using. (One reason why FAQ 1.1 requests that you provide version numbers for pomp and R and any other packages that you're using).

At any rate, you can check to see if this matters by substituting

dobs <- Csnippet("lik = dpois(nearbyint(cases),rho * C,give_log);")

for the dmeasure component.

Another diagnostic question: How much of the variability you show in the second plot is process noise and how much is measurement error? You can get at this by plotting rho*C for a number of simulations and comparing the result with both data and simulated data. (In these plots, it's generally more useful to plot the individual simulated paths in addition to the envelope you've shown.) My point is that the Poisson measurement error has a very small variance, which may account for the fact that almost all particles are inconsistent with the data.

MathGIS commented 4 years ago

Thank you very much for your reply and suggestions, in fact I'm surprised to receive your reply so quickly. Please allow me to continue our discussion. I produced the codes using R version 3.6.3 and pomp version 2.8.

First, I substituted the dmeasure component according with your suggestion. But the same issue still remained.

Second, your point that the Poisson measurement error has a very small variance gave me an important clue, I run some simulations with parameters (same as simulations shown previously):

params = c(
  bite = 0.75,
  ph2v = 0.5,
  pv2h = 0.5,
  delta_h = 1 / 7,
  theta = 0.4,
  gamma_h = 1 / 5,
  mu_h = 1.0 / (365 * 75),
  delta_v = 1 / 10,
  mu_v = 1 / 25,
  rho = 0.1,
  erad = 5.0,
  nvdcr = 0.2
)

And then I chose one simulation that seems fitting well with reported cases (saved in Data_and_Sims.csv):

DateNo cases Sims
162 1 0
163 0 0
164 0 0
165 0 0
166 0 0
167 1 0
168 1 0
169 0 0
170 0 0
171 1 0
172 0 0
173 0 0
174 0 0
175 0 0
176 1 0
177 0 0
178 3 0
179 2 0
180 1 0
181 3 0
182 4 0
183 4 0
184 3 0
185 0 0
186 1 0
187 1 0
188 1 0
189 2 1
190 3 2
191 0 0
192 4 0
193 5 1
194 5 0
195 4 0
196 9 0
197 13 1
198 12 0
199 5 1
200 14 0
201 15 0
202 9 1
203 6 2
204 9 0
205 7 0
206 10 3
207 7 2
208 17 1
209 15 1
210 11 0
211 22 1
212 21 3
213 23 1
214 22 1
215 20 4
216 19 3
217 23 5
218 20 8
219 23 9
220 24 10
221 28 6
222 28 8
223 35 11
224 30 7
225 34 11
226 39 16
227 51 14
228 52 11
229 36 11
230 53 13
231 50 17
232 60 25
233 80 33
234 59 28
235 62 31
236 72 26
237 71 54
238 85 42
239 94 30
240 107 53
241 132 51
242 129 69
243 132 68
244 184 73
245 176 109
246 180 100
247 181 118
248 198 121
249 185 135
250 206 148
251 254 138
252 239 205
253 266 193
254 277 224
255 328 266
256 323 299
257 404 319
258 457 335
259 485 414
260 497 428
261 626 509
262 680 507
263 693 596
264 670 679
265 792 684
266 763 809
267 736 877
268 863 1044
269 853 1087
270 837 1216
271 1097 1326
272 957 1305
273 1157 1392
274 1613 1434
275 1306 1389
276 1065 1337
277 926 1385
278 1061 1309
279 1043 1252
280 980 1293
281 1019 1188
282 936 1145
283 1023 1085
284 786 1050
285 729 964
286 709 949
287 644 851
288 594 837
289 499 791
290 524 726
291 388 637
292 310 626
293 339 618
294 264 526
295 217 461
296 228 405
297 260 366
298 197 330
299 213 317
300 195 288
301 194 246
302 164 221
303 128 243
304 116 205
305 187 188
306 111 156
307 98 136
308 84 103
309 87 108
310 81 73
311 72 80
312 60 71
313 58 55
314 60 51
315 31 43
316 26 47
317 22 29
318 24 37
319 14 21
320 18 27
321 13 21
322 22 15
323 12 15
324 13 12
325 15 12
326 13 12
327 10 9
328 15 4
329 11 13
330 4 5
331 12 7
332 11 6
333 6 4
334 7 7
335 7 5
336 0 4
337 7 3
338 5 1
339 6 1
340 3 0
341 2 1
342 3 2
343 1 1
344 1 0
345 1 0
346 2 0
347 1 1
348 0 1
349 1 1
350 1 0
351 1 0
352 0 0
353 1 0
354 0 0
355 0 0
356 0 1
357 0 0
358 0 0
359 0 0
360 0 0
361 0 0
362 0 0
363 0 0
364 0 0
365 0 0
data<-read.csv("Data_and_Sims.csv")
ggplot(data = data,aes(x=DateNo))+
  geom_point(aes(y=cases,color="Cases"),size=2)+
  geom_line(aes(y=Sims,color="Sims"),size=2)

Fig_cases_sims

I substituted the real cases with this simulation in pomp and then run pfilter again. The loglik now can be calculated, with value -518.127 and diagnose figure: Fig_pfilter_diag_Sims

The formatted codes are as follows:


# parameters used
params = c(
  bite = 0.75,
  ph2v = 0.5,
  pv2h = 0.5,
  delta_h = 1 / 7,
  theta = 0.4,
  gamma_h = 1 / 5,
  mu_h = 1.0 / (365 * 75),
  delta_v = 1 / 10,
  mu_v = 1 / 25,
  rho = 0.1,
  erad = 5.0,
  nvdcr = 0.2
)

## reported and one simulated cases
data<-read.csv("Data_and_Sims.csv")
head(data)

# plot the reported and simulated cases
ggplot(data = data,aes(x=DateNo))+
  geom_point(aes(y=cases,color="Cases"),size=2)+
  geom_line(aes(y=Sims,color="Sims"),size=2)

# data.frame to build pomp
cases<-data[,1:2] # reported cases 

sims<-data[,c(1,3)] # one simulation 
colnames(sims)<-c("DateNo","cases") 
head(sims)

# build pomp with reported and one simulated cases
df <- pomp(
  data = sims,
  # data = cases,
  times = "DateNo",
  t0 = with(data, DateNo[1]),
  rinit = initlz,
  rprocess = euler(rproc, delta.t = 1),
  rmeasure = robs,
  dmeasure = dobs,
  obsnames = "cases",
  statenames = c("Sh", "Eh", "Ih", "Ah", "Rh", "Sv", "Ev", "Iv", "C"),
  accumvars = "C",
  paramnames = c(
    "bite",
    "ph2v",
    "pv2h",
    "delta_h",
    "theta",
    "gamma_h",
    "mu_h",
    "delta_v",
    "mu_v",
    "rho",
    "erad",
    "nvdcr"
  ),
  partrans = parameter_trans(
    log = c("bite", "delta_h", "gamma_h", "mu_h", "delta_v", "mu_v", "erad"),
    logit = c("theta", "ph2v", "pv2h", "rho", "nvdcr")
  )
)

# pfilter to calculate the log likelihood
df %>% pomp(params = params) %>%
pfilter(Np = 5000) -> pf

logLik(pf)
plot(pf)

These results prompted me that the reason of failure of pfilter with cases may include:

I am not sure whether they are correct diagnosises. And how can I solve the issue next, substitued the Poission process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggesions, please?

In the end, the figure shown previously was the 95% CI of 100 simulations, and I agree with you that it's more useful to plot the individual simulated paths, and I will plot simulated paths with your suggestion in future.

And once again I express my sincere thanks for your help.

kingaa commented 4 years ago

These results prompted me that the reason of failure of pfilter with cases may include:

  • the variance of cases is so large that the Poisson process cannot catch it;
  • there are some characters that the model didn't catch. I am not sure whether they are correct diagnoses. And how can I solve the issue next, substituted the Poisson process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggestions, please?

I am not sure these are the correct diagnoses either, but they seem to be consistent with what we observe so far. I suggest you try replacing the Poisson measurement model with one that has some overdispersion. For example, a negative binomial measurement model, as for example was done here.

MathGIS commented 4 years ago

@kingaa Thank you very much! I'll try it.

kingaa commented 4 years ago

I'll close this issue now, but feel free to re-open it if more discussion is warranted.