filter.mean vs filter.traj (with added Qs on prediction and smoothing)

Sorry, I have yet again another question. I can't quite understand the difference between the filter.mean and the filter.traj functions/arguments of pfilter.

I'm asking because an important part of movement state-space models are the values of the states. I realize that I can get all of the particle states if I set save.state = TRUE, but I have a feeling that the filter.mean or filter.traj is more adequate (or is it?). I must say I'm not 100% sure what a filtered state value is in the context of particle filters (don't you need the pdfs of the process to filter/predict/smooth?). Also, I'm not sure why we can get the pred.var but not the filter.var (or is filter.traj the variance?). Additionally, can we get the smoothed means and var?

Finally, the filter.traj vary quite a bit if you rerun the particle filter multiple times. If I am trying to get the best predictions for the states, should I rerun the particle filter and take the mean or median of the filter.traj? Anything nicer? Taking the median will give me some weird results because the state space of the state variables has large areas that are not available (fish don't go on land). So even if the filtered trajectories have no locations on land, taking the median will result on land walking fish ;)

Many thanks!

Marie

Very good question, @MarieAugerMethe!

The difference between filter.mean and filter.traj is related to the difference between filtering and smoothing. Recall that the filtering distribution is defined to be the distribution of latent states, X_t, given the data up through time t. That is, X_t | y^*_1:t, t=1,...,T. By contrast, the smoothing distribution is the distribution of X_t given all the data: X_t | y^*_1:T, t=1,...,T. The filtering distribution is a natural by-product of the particle filter and pomp summarizes it in the form of the "filter mean" when you set filter.mean=TRUE. [Note that, if you set save.states=TRUE, pfilter saves the full filtering distribution for you, as represented by the ensemble of particles. This can be useful in forecasting, for example.]

The smoothing distribution (which, I assume, is more interesting to you) is more difficult to come by. The reason for this is that one must keep track not only of the ensemble of particles "alive" at time t, but of the histories (the pedigrees, so to speak) of the individual particles as well. This involves more work and more storage, so pfilter does not do it by default. Setting filter.traj=TRUE in a call to pfilter causes some of this information to be stored in such a way that one can extract the smoothing distribution (or, rather, a sample of draws from it). In particular, one performs multiple independent particle filter operations, setting filter.traj=TRUE, and extracts a single trajectory from each one using the filter.traj operation. Together, this ensemble of trajectories represents the smoothing distribution of the latent state path.

I should point out that, while the above operation accounts for variability in the latent state process and in the measurement process, it ignores parameter uncertainty. One could factor in this uncertainty, for example, by running pmcmc instead of pfilter, again with filter.traj=TRUE, as illustrated at the end of the Getting Started vignette. We explore an alternative strategy (a kind of empirical Bayes approach), in our "avoidable errors" paper

You go on to ask about the best ways of summarizing the distribution. As you point out, the standard measures of central tendency (mean, median) leave something to be desired. This is, of course, another instance of the average failing to be typical. To be clear, I believe you should be seeing that each individual sample from the smoothing distribution obeys the constraints (i.e., fish staying wet), but that the distribution is multimodal. I won't hazard any general recommendations for summarizing multimodal data, but a variety of relatively straightforward options present themselves. For visual display, for example, would it be enough to simply overlay a reasonably large number of samples on the map (perhaps using semi-transparency to avoid overplotting)? Or one could construct kernel density estimates in space-time.... Or, if it's just a simple case where we have one island and two masses of trajectories (one in which the animals go eastabout, the other in which they go westabout, say), then perhaps it is not too hard to divide each trajectory into one of the categories and summarize each category separately....

Finally, you ask about why the prediction variance is made available (i.e., pred.var=TRUE), but not the filtering variance. This is really because we've found the prediction variance to be useful (for constructing prediction residuals and diagnosing filter performance, for instance), while the filtering variance hasn't been something we've routinely wanted to keep track of. Note, however, that you could compute it quite easily if you've saved the full set of samples from the filtering distribution (by setting save.states=TRUE).

Thank you @kingaa !

Extremely helpful!

Ok, I understand generally what you are saying, but I'm still confused by a few details.

Using a particle filter approach, if you wanted the:

predicted state: you would want the mean of sampled particles for time t based on the observation at time t-1 (i.e. the weight of the particles), hence Xt | y*1:t-1, t=1,...,T
filtered state: you would want the mean of the resampled particle values at time t based on the fit to the data at time t (i.e. the weight of the particles), hence Xt | y*1:t, t=1,...,T - not sure if this make sense, because I'm not sure whether you resample the particles based on the weight as a step before applying the sampling from process equation. If you do, I think this make sense. If you don't, they I don't quite understand.
smoothed state: you would want the mean of the particle values that are kept for the whole time series with backtracking the resampling based on the weights, hence Xt | y*1:T, t=1,...,T

With pomp, you can get:

the predicted state with pred.mean and the variance with pred.var
the filtered state with filter.mean and the variance by taking the variance from the saved states
a sample of the smoothing distribution with filter.traj. What I undestand is that you get the history of one of the particle that made it through the whole particle filter (?), maybe the one with the highest weight (?). Hence, to get the smoothing distribution you want to do multiple particle filters, and extract the filtered trajectory for each of these filter runs. To get the smoothed state (if we ignore the bimodality for now) you could take the mean of the values from these filtered trajectories for each time step. This explain why the filtered trajectory is so different across different particle filter runs. But one question, why just one trajectory? Why not all of the trajectories? And how does pomp choose the trajectory returned?

Re bimodality, yes I was already doing the visualissation trick, but the variability in the data was so overwhelming that I was trying to find another solution. I think now that I understand better what the different function returns, it will be easier for me to find a solution. I was under the assumption that filter.traj was a mean of some sort already and I think this threw me off.

Aye aye aye, yes good point regarding the parameter uncertainty. But I think this is something for another day.

Suggestion: in the help file of pfilter, there is the definition of filter.mean under the methods heading but not filter.traj. It would be good to add the definition.

Thank you!!!

Marie

@MarieAugerMethe, you ask:

What I undestand is that you get the history of one of the particle that made it through the whole particle filter (?), maybe the one with the highest weight (?). Hence, to get the smoothing distribution you want to do multiple particle filters, and extract the filtered trajectory for each of these filter runs. To get the smoothed state (if we ignore the bimodality for now) you could take the mean of the values from these filtered trajectories for each time step. This explain why the filtered trajectory is so different across different particle filter runs. But one question, why just one trajectory? Why not all of the trajectories? And how does pomp choose the trajectory returned?

First, let me clarify one thing about pomp's particle filter, pfilter. This is a very simple implementation of the particle filter (the "SIR" filter in the terminology of Arulampalam et al. (2002)). As such, at each stage, the particles should be considered to have equal weight. More complex algorithms track the weights of particles, renormalizing via resampling less frequently than does pfilter. As an aside, I point out that there is certainly scope within pomp for a variety of particle filter algorithms, and contributions are welcome!

Now, on to your question about why we use just a single draw from each particle filter calculation. On the face of it, this does appear wasteful: Each pfilter computation involves rather a lot of work; why not take better advantage of it? The difficulty is that the individual particles are not independent samples from the smoothing distribution. This is because of their shared ancestry. In particular, it will typically be the case that all of the particles alive at time T (the end of the time series) collectively descend from only a small number (perhaps just one) of the particles present at time t₀. However, if we run multiple independent filters, and take just one trajectory (i.e., a surviving particle and its pedigree), these are independent samples from the smoothing distribution. The one sample that filter.traj takes is (as it must be) purely random from among the surviving particles.

@MarieAugerMethe: In looking over the code to make sure I was giving you the correct answer, I discovered a bug. In particular, pfilter (and, by extension, pmcmc) were using a weighted sample to get the filter trajectory, where they should have been doing an unweighted sample. This is now fixed (as of version 1.18.7.1, due to be released later this morning).

Yet another benefit of your excellent questions!

Thanks @kingaa !

Ok, got it. Now I'll just have to figure out how to deal with the bimodality ;)

Thanks!

kingaa / pomp

filter.mean vs filter.traj (with added Qs on prediction and smoothing) #74