The simulations of my POMP are reasonable, but the pfilter warned filtering failures occurred.

MathGIS commented 4 years ago

Recently, I build a POMP object to model the spread of diseases. The simulations seem reasonable with assumed parameters. DF_sim01 DF_sim02 However, when I computed the loglik of the assumed parameter, and hope to fit the model to data to get MLE, I can only get -Inf. The diagnositic plot is as follows: Fig_pfilter_diag This confused me, I think the assumed parameter values can give reasonable outputs, and therefore I guess the loglik would be around the maximum of loglik. But the loglik is -Inf by pfilter, and I cannot proceed on the other calculation. I don't know why is it. Would you please give me some suggestions?

Here are the codes I used.

## ---- Model - rprocess ----
rproc <- Csnippet(
  "
  double lambda_h, lambda_v, sim_births_h, sim_births_v, Nh, Nv, mu_v2;
  double rate[15], trans[15];

  // Tota human and vector population
  Nh = Sh + Eh + Ih + Ah + Rh;

  if(t<267)
    Nv = Sv + Ev + Iv;
  else
    Nv = nvdcr * (Sv + Ev + Iv);

  // Force of infection from vector to human
  lambda_v = bite * pv2h * Iv / Nh;
  // Force of infection from human to vector
  lambda_h = bite * ph2v * (Ih + Ah) / Nh;

  if(t<267)
    mu_v2 = mu_v;
  else
    mu_v2 = erad * mu_v;

  // Rates
  rate[0] = lambda_v;  // force of infection from vector to human
  rate[1] = mu_h;      // Human death rate
  rate[2] = theta * delta_h;       // theta * 1/IIP(theta is the proportion
                                   // of sympotomatic )
  rate[3] = (1 - theta) * delta_h;       // (1 - theta) * 1/IIP (theta is the
                                         // proportion of asymptomatic )
  rate[4] = mu_h;       // Human death rate
  rate[5] = gamma_h ;   // 1/Infectious period of human
  rate[6] = mu_h;       // Human death rate
  rate[7] = gamma_h ;   // 1/Infectious period of human  ;
  rate[8] = mu_h;       // Human death rate
  rate[9] = mu_h;       // Human death rate
  rate[10] = lambda_h;  // force of infection from human to vector
  rate[11] = mu_v2;      // Vector death rate
  rate[12] = delta_v;   // 1/EIP
  rate[13] = mu_v2;      // Vector death rate
  rate[14] = mu_v2;      // Vector death rate

  // Poisson births
  sim_births_h = rpois(Nh * mu_h * dt);       // Human
  sim_births_v = rpois(Nv * mu_v * dt);       // Vector

  // Transitions between classes
  reulermultinom(2,Sh,&rate[0],dt,&trans[0]);
  reulermultinom(3,Eh,&rate[2],dt,&trans[2]);
  reulermultinom(2,Ih,&rate[5],dt,&trans[5]);
  reulermultinom(2,Ah,&rate[7],dt,&trans[7]);
  reulermultinom(1,Rh,&rate[9],dt,&trans[9]);

  reulermultinom(2,Sv,&rate[10],dt,&trans[10]);
  reulermultinom(2,Ev,&rate[12],dt,&trans[12]);
  reulermultinom(1,Iv,&rate[14],dt,&trans[14]);

  // Equations
  Sh += sim_births_h - trans[0] - trans[1];
  Eh += trans[0] - trans[2] - trans[3] - trans[4];
  Ih += trans[2] - trans[5] - trans[6];
  Ah += trans[3] - trans[7] - trans[8];
  Rh += trans[5] + trans[7] - trans[9];

  Sv += sim_births_v - trans[10] - trans[11];
  Ev += trans[10] - trans[12] - trans[13];
  Iv += trans[12] - trans[14];

  C += trans[2];           // true incidence
  "
)

#-- Init variables --#
initlz <- Csnippet("
  Sh = 12838900;
  Eh = 0;
  Ih = 5;
  Ah = 0;
  Rh = 0;

  Sv = nearbyint(1.5 * 12838900);
  Ev = 0;
  Iv = 0;

  C = 0;
")
## -------------------------

## ---- Observation process ----
robs <- Csnippet("
 cases = rpois(rho * C);
 ")

dobs <- Csnippet("
 double lI = dpois(nearbyint(cases),rho * C,0);
 lik = (give_log) ? log(lI) : lI;
 ")

## ----------------------------

## ---- klc: creat pomp object and simulation of df ----
df <- pomp(
  data = data,
  times = "DateNo",
  t0 = with(data, DateNo[1]),
  rinit = initlz,
  rprocess = euler(rproc, delta.t = 1),
  rmeasure = robs,
  dmeasure = dobs,
  obsnames = "cases",
  statenames = c("Sh", "Eh", "Ih", "Ah", "Rh", "Sv", "Ev", "Iv", "C"),
  accumvars = "C",
  paramnames = c(
    "bite",
    "ph2v",
    "pv2h",
    "delta_h",
    "theta",
    "gamma_h",
    "mu_h",
    "delta_v",
    "mu_v",
    "rho",
    "erad",
    "nvdcr"
  ),
  partrans = parameter_trans(
    log = c("bite", "delta_h", "gamma_h", "mu_h", "delta_v", "mu_v", "erad"),
    logit = c("theta", "ph2v", "pv2h", "rho", "nvdcr")
  )
)

kingaa commented 4 years ago

@MathGIS: Thanks for doing such a nice job of presenting your issue. Having all the code and the plots was really useful to help me see the issue right away. I'm aware that it's a nontrivial amount of work to do so. You didn't provide the data (simulated data would've worked), so I wasn't able to reproduce the issue. I modified your codes to try and simulate the data myself, but you didn't provide parameter values, so I couldn't do that either. So, I'm taking something of a shot in the dark in the following.

I agree that it does appear your model has sufficient variability, and at the parameters you've used for the simulations, a reasonable trend. This suggests that perhaps the problem is a superficial one. Looking over the codes, only one thing jumps out at me this morning, midway through my first cup of coffee: the dmeasure code relies on the log of dpois(log=FALSE) being as accurate as dpois(log=TRUE) for the parameter ranges in question. Offhand, I don't know if that is true. On the other hand, whether that would matter for particle filtering depends on what version of pomp you're using. (One reason why FAQ 1.1 requests that you provide version numbers for pomp and R and any other packages that you're using).

At any rate, you can check to see if this matters by substituting

dobs <- Csnippet("lik = dpois(nearbyint(cases),rho * C,give_log);")

for the dmeasure component.

Another diagnostic question: How much of the variability you show in the second plot is process noise and how much is measurement error? You can get at this by plotting rho*C for a number of simulations and comparing the result with both data and simulated data. (In these plots, it's generally more useful to plot the individual simulated paths in addition to the envelope you've shown.) My point is that the Poisson measurement error has a very small variance, which may account for the fact that almost all particles are inconsistent with the data.

MathGIS commented 4 years ago

Thank you very much for your reply and suggestions, in fact I'm surprised to receive your reply so quickly. Please allow me to continue our discussion. I produced the codes using R version 3.6.3 and pomp version 2.8.

First, I substituted the dmeasure component according with your suggestion. But the same issue still remained.

Second, your point that the Poisson measurement error has a very small variance gave me an important clue, I run some simulations with parameters (same as simulations shown previously):

params = c(
  bite = 0.75,
  ph2v = 0.5,
  pv2h = 0.5,
  delta_h = 1 / 7,
  theta = 0.4,
  gamma_h = 1 / 5,
  mu_h = 1.0 / (365 * 75),
  delta_v = 1 / 10,
  mu_v = 1 / 25,
  rho = 0.1,
  erad = 5.0,
  nvdcr = 0.2
)

And then I chose one simulation that seems fitting well with reported cases (saved in Data_and_Sims.csv):

DateNo	cases	Sims
162	1	0
163	0	0
164	0	0
165	0	0
166	0	0
167	1	0
168	1	0
169	0	0
170	0	0
171	1	0
172	0	0
173	0	0
174	0	0
175	0	0
176	1	0
177	0	0
178	3	0
179	2	0
180	1	0
181	3	0
182	4	0
183	4	0
184	3	0
185	0	0
186	1	0
187	1	0
188	1	0
189	2	1
190	3	2
191	0	0
192	4	0
193	5	1
194	5	0
195	4	0
196	9	0
197	13	1
198	12	0
199	5	1
200	14	0
201	15	0
202	9	1
203	6	2
204	9	0
205	7	0
206	10	3
207	7	2
208	17	1
209	15	1
210	11	0
211	22	1
212	21	3
213	23	1
214	22	1
215	20	4
216	19	3
217	23	5
218	20	8
219	23	9
220	24	10
221	28	6
222	28	8
223	35	11
224	30	7
225	34	11
226	39	16
227	51	14
228	52	11
229	36	11
230	53	13
231	50	17
232	60	25
233	80	33
234	59	28
235	62	31
236	72	26
237	71	54
238	85	42
239	94	30
240	107	53
241	132	51
242	129	69
243	132	68
244	184	73
245	176	109
246	180	100
247	181	118
248	198	121
249	185	135
250	206	148
251	254	138
252	239	205
253	266	193
254	277	224
255	328	266
256	323	299
257	404	319
258	457	335
259	485	414
260	497	428
261	626	509
262	680	507
263	693	596
264	670	679
265	792	684
266	763	809
267	736	877
268	863	1044
269	853	1087
270	837	1216
271	1097	1326
272	957	1305
273	1157	1392
274	1613	1434
275	1306	1389
276	1065	1337
277	926	1385
278	1061	1309
279	1043	1252
280	980	1293
281	1019	1188
282	936	1145
283	1023	1085
284	786	1050
285	729	964
286	709	949
287	644	851
288	594	837
289	499	791
290	524	726
291	388	637
292	310	626
293	339	618
294	264	526
295	217	461
296	228	405
297	260	366
298	197	330
299	213	317
300	195	288
301	194	246
302	164	221
303	128	243
304	116	205
305	187	188
306	111	156
307	98	136
308	84	103
309	87	108
310	81	73
311	72	80
312	60	71
313	58	55
314	60	51
315	31	43
316	26	47
317	22	29
318	24	37
319	14	21
320	18	27
321	13	21
322	22	15
323	12	15
324	13	12
325	15	12
326	13	12
327	10	9
328	15	4
329	11	13
330	4	5
331	12	7
332	11	6
333	6	4
334	7	7
335	7	5
336	0	4
337	7	3
338	5	1
339	6	1
340	3	0
341	2	1
342	3	2
343	1	1
344	1	0
345	1	0
346	2	0
347	1	1
348	0	1
349	1	1
350	1	0
351	1	0
352	0	0
353	1	0
354	0	0
355	0	0
356	0	1
357	0	0
358	0	0
359	0	0
360	0	0
361	0	0
362	0	0
363	0	0
364	0	0
365	0	0

data<-read.csv("Data_and_Sims.csv")
ggplot(data = data,aes(x=DateNo))+
  geom_point(aes(y=cases,color="Cases"),size=2)+
  geom_line(aes(y=Sims,color="Sims"),size=2)

Fig_cases_sims

I substituted the real cases with this simulation in pomp and then run pfilter again. The loglik now can be calculated, with value -518.127 and diagnose figure: Fig_pfilter_diag_Sims

The formatted codes are as follows:


# parameters used
params = c(
  bite = 0.75,
  ph2v = 0.5,
  pv2h = 0.5,
  delta_h = 1 / 7,
  theta = 0.4,
  gamma_h = 1 / 5,
  mu_h = 1.0 / (365 * 75),
  delta_v = 1 / 10,
  mu_v = 1 / 25,
  rho = 0.1,
  erad = 5.0,
  nvdcr = 0.2
)

## reported and one simulated cases
data<-read.csv("Data_and_Sims.csv")
head(data)

# plot the reported and simulated cases
ggplot(data = data,aes(x=DateNo))+
  geom_point(aes(y=cases,color="Cases"),size=2)+
  geom_line(aes(y=Sims,color="Sims"),size=2)

# data.frame to build pomp
cases<-data[,1:2] # reported cases 

sims<-data[,c(1,3)] # one simulation 
colnames(sims)<-c("DateNo","cases") 
head(sims)

# build pomp with reported and one simulated cases
df <- pomp(
  data = sims,
  # data = cases,
  times = "DateNo",
  t0 = with(data, DateNo[1]),
  rinit = initlz,
  rprocess = euler(rproc, delta.t = 1),
  rmeasure = robs,
  dmeasure = dobs,
  obsnames = "cases",
  statenames = c("Sh", "Eh", "Ih", "Ah", "Rh", "Sv", "Ev", "Iv", "C"),
  accumvars = "C",
  paramnames = c(
    "bite",
    "ph2v",
    "pv2h",
    "delta_h",
    "theta",
    "gamma_h",
    "mu_h",
    "delta_v",
    "mu_v",
    "rho",
    "erad",
    "nvdcr"
  ),
  partrans = parameter_trans(
    log = c("bite", "delta_h", "gamma_h", "mu_h", "delta_v", "mu_v", "erad"),
    logit = c("theta", "ph2v", "pv2h", "rho", "nvdcr")
  )
)

# pfilter to calculate the log likelihood
df %>% pomp(params = params) %>%
pfilter(Np = 5000) -> pf

logLik(pf)
plot(pf)

These results prompted me that the reason of failure of pfilter with cases may include:

the variance of cases is so large that the Poisson process cannot catch it;
there are some characters that the model didn't catch.

I am not sure whether they are correct diagnosises. And how can I solve the issue next, substitued the Poission process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggesions, please?

In the end, the figure shown previously was the 95% CI of 100 simulations, and I agree with you that it's more useful to plot the individual simulated paths, and I will plot simulated paths with your suggestion in future.

And once again I express my sincere thanks for your help.

kingaa commented 4 years ago

These results prompted me that the reason of failure of pfilter with cases may include:

the variance of cases is so large that the Poisson process cannot catch it;

there are some characters that the model didn't catch. I am not sure whether they are correct diagnoses. And how can I solve the issue next, substituted the Poisson process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggestions, please?

I am not sure these are the correct diagnoses either, but they seem to be consistent with what we observe so far. I suggest you try replacing the Poisson measurement model with one that has some overdispersion. For example, a negative binomial measurement model, as for example was done here.

MathGIS commented 4 years ago

@kingaa Thank you very much! I'll try it.

kingaa commented 4 years ago

I'll close this issue now, but feel free to re-open it if more discussion is warranted.

kingaa / pomp

The simulations of my POMP are reasonable, but the pfilter warned filtering failures occurred. #113