Closed MathGIS closed 4 years ago
@MathGIS: Thanks for doing such a nice job of presenting your issue. Having all the code and the plots was really useful to help me see the issue right away. I'm aware that it's a nontrivial amount of work to do so. You didn't provide the data (simulated data would've worked), so I wasn't able to reproduce the issue. I modified your codes to try and simulate the data myself, but you didn't provide parameter values, so I couldn't do that either. So, I'm taking something of a shot in the dark in the following.
I agree that it does appear your model has sufficient variability, and at the parameters you've used for the simulations, a reasonable trend. This suggests that perhaps the problem is a superficial one. Looking over the codes, only one thing jumps out at me this morning, midway through my first cup of coffee: the dmeasure code relies on the log of dpois(log=FALSE)
being as accurate as dpois(log=TRUE)
for the parameter ranges in question. Offhand, I don't know if that is true. On the other hand, whether that would matter for particle filtering depends on what version of pomp you're using. (One reason why FAQ 1.1 requests that you provide version numbers for pomp and R and any other packages that you're using).
At any rate, you can check to see if this matters by substituting
dobs <- Csnippet("lik = dpois(nearbyint(cases),rho * C,give_log);")
for the dmeasure component.
Another diagnostic question: How much of the variability you show in the second plot is process noise and how much is measurement error? You can get at this by plotting rho*C
for a number of simulations and comparing the result with both data and simulated data. (In these plots, it's generally more useful to plot the individual simulated paths in addition to the envelope you've shown.) My point is that the Poisson measurement error has a very small variance, which may account for the fact that almost all particles are inconsistent with the data.
Thank you very much for your reply and suggestions, in fact I'm surprised to receive your reply so quickly. Please allow me to continue our discussion. I produced the codes using R version 3.6.3 and pomp version 2.8.
First, I substituted the dmeasure component according with your suggestion. But the same issue still remained.
Second, your point that the Poisson measurement error has a very small variance gave me an important clue, I run some simulations with parameters (same as simulations shown previously):
params = c(
bite = 0.75,
ph2v = 0.5,
pv2h = 0.5,
delta_h = 1 / 7,
theta = 0.4,
gamma_h = 1 / 5,
mu_h = 1.0 / (365 * 75),
delta_v = 1 / 10,
mu_v = 1 / 25,
rho = 0.1,
erad = 5.0,
nvdcr = 0.2
)
And then I chose one simulation that seems fitting well with reported cases (saved in Data_and_Sims.csv):
DateNo | cases | Sims |
---|---|---|
162 | 1 | 0 |
163 | 0 | 0 |
164 | 0 | 0 |
165 | 0 | 0 |
166 | 0 | 0 |
167 | 1 | 0 |
168 | 1 | 0 |
169 | 0 | 0 |
170 | 0 | 0 |
171 | 1 | 0 |
172 | 0 | 0 |
173 | 0 | 0 |
174 | 0 | 0 |
175 | 0 | 0 |
176 | 1 | 0 |
177 | 0 | 0 |
178 | 3 | 0 |
179 | 2 | 0 |
180 | 1 | 0 |
181 | 3 | 0 |
182 | 4 | 0 |
183 | 4 | 0 |
184 | 3 | 0 |
185 | 0 | 0 |
186 | 1 | 0 |
187 | 1 | 0 |
188 | 1 | 0 |
189 | 2 | 1 |
190 | 3 | 2 |
191 | 0 | 0 |
192 | 4 | 0 |
193 | 5 | 1 |
194 | 5 | 0 |
195 | 4 | 0 |
196 | 9 | 0 |
197 | 13 | 1 |
198 | 12 | 0 |
199 | 5 | 1 |
200 | 14 | 0 |
201 | 15 | 0 |
202 | 9 | 1 |
203 | 6 | 2 |
204 | 9 | 0 |
205 | 7 | 0 |
206 | 10 | 3 |
207 | 7 | 2 |
208 | 17 | 1 |
209 | 15 | 1 |
210 | 11 | 0 |
211 | 22 | 1 |
212 | 21 | 3 |
213 | 23 | 1 |
214 | 22 | 1 |
215 | 20 | 4 |
216 | 19 | 3 |
217 | 23 | 5 |
218 | 20 | 8 |
219 | 23 | 9 |
220 | 24 | 10 |
221 | 28 | 6 |
222 | 28 | 8 |
223 | 35 | 11 |
224 | 30 | 7 |
225 | 34 | 11 |
226 | 39 | 16 |
227 | 51 | 14 |
228 | 52 | 11 |
229 | 36 | 11 |
230 | 53 | 13 |
231 | 50 | 17 |
232 | 60 | 25 |
233 | 80 | 33 |
234 | 59 | 28 |
235 | 62 | 31 |
236 | 72 | 26 |
237 | 71 | 54 |
238 | 85 | 42 |
239 | 94 | 30 |
240 | 107 | 53 |
241 | 132 | 51 |
242 | 129 | 69 |
243 | 132 | 68 |
244 | 184 | 73 |
245 | 176 | 109 |
246 | 180 | 100 |
247 | 181 | 118 |
248 | 198 | 121 |
249 | 185 | 135 |
250 | 206 | 148 |
251 | 254 | 138 |
252 | 239 | 205 |
253 | 266 | 193 |
254 | 277 | 224 |
255 | 328 | 266 |
256 | 323 | 299 |
257 | 404 | 319 |
258 | 457 | 335 |
259 | 485 | 414 |
260 | 497 | 428 |
261 | 626 | 509 |
262 | 680 | 507 |
263 | 693 | 596 |
264 | 670 | 679 |
265 | 792 | 684 |
266 | 763 | 809 |
267 | 736 | 877 |
268 | 863 | 1044 |
269 | 853 | 1087 |
270 | 837 | 1216 |
271 | 1097 | 1326 |
272 | 957 | 1305 |
273 | 1157 | 1392 |
274 | 1613 | 1434 |
275 | 1306 | 1389 |
276 | 1065 | 1337 |
277 | 926 | 1385 |
278 | 1061 | 1309 |
279 | 1043 | 1252 |
280 | 980 | 1293 |
281 | 1019 | 1188 |
282 | 936 | 1145 |
283 | 1023 | 1085 |
284 | 786 | 1050 |
285 | 729 | 964 |
286 | 709 | 949 |
287 | 644 | 851 |
288 | 594 | 837 |
289 | 499 | 791 |
290 | 524 | 726 |
291 | 388 | 637 |
292 | 310 | 626 |
293 | 339 | 618 |
294 | 264 | 526 |
295 | 217 | 461 |
296 | 228 | 405 |
297 | 260 | 366 |
298 | 197 | 330 |
299 | 213 | 317 |
300 | 195 | 288 |
301 | 194 | 246 |
302 | 164 | 221 |
303 | 128 | 243 |
304 | 116 | 205 |
305 | 187 | 188 |
306 | 111 | 156 |
307 | 98 | 136 |
308 | 84 | 103 |
309 | 87 | 108 |
310 | 81 | 73 |
311 | 72 | 80 |
312 | 60 | 71 |
313 | 58 | 55 |
314 | 60 | 51 |
315 | 31 | 43 |
316 | 26 | 47 |
317 | 22 | 29 |
318 | 24 | 37 |
319 | 14 | 21 |
320 | 18 | 27 |
321 | 13 | 21 |
322 | 22 | 15 |
323 | 12 | 15 |
324 | 13 | 12 |
325 | 15 | 12 |
326 | 13 | 12 |
327 | 10 | 9 |
328 | 15 | 4 |
329 | 11 | 13 |
330 | 4 | 5 |
331 | 12 | 7 |
332 | 11 | 6 |
333 | 6 | 4 |
334 | 7 | 7 |
335 | 7 | 5 |
336 | 0 | 4 |
337 | 7 | 3 |
338 | 5 | 1 |
339 | 6 | 1 |
340 | 3 | 0 |
341 | 2 | 1 |
342 | 3 | 2 |
343 | 1 | 1 |
344 | 1 | 0 |
345 | 1 | 0 |
346 | 2 | 0 |
347 | 1 | 1 |
348 | 0 | 1 |
349 | 1 | 1 |
350 | 1 | 0 |
351 | 1 | 0 |
352 | 0 | 0 |
353 | 1 | 0 |
354 | 0 | 0 |
355 | 0 | 0 |
356 | 0 | 1 |
357 | 0 | 0 |
358 | 0 | 0 |
359 | 0 | 0 |
360 | 0 | 0 |
361 | 0 | 0 |
362 | 0 | 0 |
363 | 0 | 0 |
364 | 0 | 0 |
365 | 0 | 0 |
data<-read.csv("Data_and_Sims.csv")
ggplot(data = data,aes(x=DateNo))+
geom_point(aes(y=cases,color="Cases"),size=2)+
geom_line(aes(y=Sims,color="Sims"),size=2)
I substituted the real cases with this simulation in pomp and then run pfilter again. The loglik now can be calculated, with value -518.127 and diagnose figure:
The formatted codes are as follows:
# parameters used
params = c(
bite = 0.75,
ph2v = 0.5,
pv2h = 0.5,
delta_h = 1 / 7,
theta = 0.4,
gamma_h = 1 / 5,
mu_h = 1.0 / (365 * 75),
delta_v = 1 / 10,
mu_v = 1 / 25,
rho = 0.1,
erad = 5.0,
nvdcr = 0.2
)
## reported and one simulated cases
data<-read.csv("Data_and_Sims.csv")
head(data)
# plot the reported and simulated cases
ggplot(data = data,aes(x=DateNo))+
geom_point(aes(y=cases,color="Cases"),size=2)+
geom_line(aes(y=Sims,color="Sims"),size=2)
# data.frame to build pomp
cases<-data[,1:2] # reported cases
sims<-data[,c(1,3)] # one simulation
colnames(sims)<-c("DateNo","cases")
head(sims)
# build pomp with reported and one simulated cases
df <- pomp(
data = sims,
# data = cases,
times = "DateNo",
t0 = with(data, DateNo[1]),
rinit = initlz,
rprocess = euler(rproc, delta.t = 1),
rmeasure = robs,
dmeasure = dobs,
obsnames = "cases",
statenames = c("Sh", "Eh", "Ih", "Ah", "Rh", "Sv", "Ev", "Iv", "C"),
accumvars = "C",
paramnames = c(
"bite",
"ph2v",
"pv2h",
"delta_h",
"theta",
"gamma_h",
"mu_h",
"delta_v",
"mu_v",
"rho",
"erad",
"nvdcr"
),
partrans = parameter_trans(
log = c("bite", "delta_h", "gamma_h", "mu_h", "delta_v", "mu_v", "erad"),
logit = c("theta", "ph2v", "pv2h", "rho", "nvdcr")
)
)
# pfilter to calculate the log likelihood
df %>% pomp(params = params) %>%
pfilter(Np = 5000) -> pf
logLik(pf)
plot(pf)
These results prompted me that the reason of failure of pfilter with cases may include:
I am not sure whether they are correct diagnosises. And how can I solve the issue next, substitued the Poission process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggesions, please?
In the end, the figure shown previously was the 95% CI of 100 simulations, and I agree with you that it's more useful to plot the individual simulated paths, and I will plot simulated paths with your suggestion in future.
And once again I express my sincere thanks for your help.
These results prompted me that the reason of failure of pfilter with cases may include:
- the variance of cases is so large that the Poisson process cannot catch it;
- there are some characters that the model didn't catch. I am not sure whether they are correct diagnoses. And how can I solve the issue next, substituted the Poisson process with others, negative binomial process for example, or other solutions? Would it be possible to give me some valuable suggestions, please?
I am not sure these are the correct diagnoses either, but they seem to be consistent with what we observe so far. I suggest you try replacing the Poisson measurement model with one that has some overdispersion. For example, a negative binomial measurement model, as for example was done here.
@kingaa Thank you very much! I'll try it.
I'll close this issue now, but feel free to re-open it if more discussion is warranted.
Recently, I build a POMP object to model the spread of diseases. The simulations seem reasonable with assumed parameters. However, when I computed the loglik of the assumed parameter, and hope to fit the model to data to get MLE, I can only get -Inf. The diagnositic plot is as follows: This confused me, I think the assumed parameter values can give reasonable outputs, and therefore I guess the loglik would be around the maximum of loglik. But the loglik is -Inf by pfilter, and I cannot proceed on the other calculation. I don't know why is it. Would you please give me some suggestions?
Here are the codes I used.