Multi-year screw trap models in Stan

ebuhle commented 5 years ago

Hey @marksorel8, I added a directory containing Stan code, example data (from the Tucannon) and a self-contained R script that serves as a demo of both the single- and multi-year models. The latter takes ~4 hrs on my PC.

The multi-year version will need some modifications to make it suitable for your needs. In no particular order:

As we discussed, treating the N_year-length vector of daily outmigrant counts as a MAR(1) process (with "unconstrained" process error covariance, in MARSS terminology) induces a process hyperprior on migrant abundance across years. Probably you'd want to just scrap this formulation altogether, reverting to independent univariate AR(1) processes (or maybe better yet, random walks) within each year. There might be some clever way to allow the user to specify either option at runtime, by toggling b/w unconstrained vs. diagonal process covariance matrices.
As we also discussed, even if one did want to model interannual correlations in phenology, it's not clear that my approach is the best way to do it. In the example data, the daily co-movements are actually pretty weakly correlated (you can see this in the last plot in the script).
Currently, the multi-year model allows covariates of trap efficiency but not migrant abundance. The data and params to do this are declared, but commented out, and the covariate effects aren't included in the MAR(1) model in the transformed parameters block. This would be easy to do.
Neither version includes the fancy BTSPAS stuff to deal with the timing of movement in efficiency trials. You'd need to add that.
And of course, ultimately this will be just a submodel embedded in a more complex structure that includes IPTDS detections, marked and unmarked RST detections after fall remote-tagging, etc. This code could serve as a building block, though.

Let me know if/when you have Qs!

marksorel8 commented 5 years ago

My laugh emoji was supposed to be a smiley face emoji. While you have a great sense of humor, Eric, I don't believe this is meant to be funny. I'm very happy to have this super useful information though. Thank you!

marksorel8 commented 5 years ago

I have now read through and run parts of the demo. Super awesome. Here are my thoughts:

The multi-year model was harder for me to follow (the Cholesky stuff), so I'm happy to revert to independent AR(1) or random walks. Preference? These would share information on the p-model (e.g. flow coefficient) though I think. I think your p-model in the multi-year is great.
As for covariates on daily abundance, I don't think it would be straight forward or worth it necessarily. The relationship with flow has got to be incredibly complex and seasonally variable. We could try a GAM-type approach but I'm not sure that would even be very effective or worth the effort.
The one thing I've thought of doing with phenology, is to fit a finite-mixture of 3 distributions that would characterize the daily expected level. The advantage of this is that I could then use the contribution of each mixture on each day to discretize the fish into life histories. Maybe each year would get an independent multiplier on this mixture (or each component?!) to reflect different annual run sizes (and relative contributions of different live histories!?). Thoughts?
BTSPAS non-diagonal errors stuff is for later I think. I still need to get that data and the Wenatchee bio is on paternity leave for a month.

Moving forward: It seems like it might not be that hard to turn your multi-year model into a series of independent AR(1) models. An efficient multi-year structure is there. Does it makes sense for me to take a hack at that?

Can you also let me know what you think of the finite mixture idea and how we might build that?

Happy to meet in person if you prefer.

Thanks!

ebuhle commented 5 years ago

The multi-year model was harder for me to follow (the Cholesky stuff), so I'm happy to revert to independent AR(1) or random walks. Preference? These would share information on the p-model (e.g. flow coefficient) though I think. I think your p-model in the multi-year is great.

Yeah, like I said, I think independent within-year daily process errors make more sense given your ultimate goal of developing a "process-agnostic" prior for total annual smolt production. I'd suggest playing around with AR vs. RW to get a feel for the differences. The main drawback of AR is that the daily outmigration process really is not stationary, and the mean-reverting tendency could introduce bias when interpolating over long data gaps (like winter, which you shouldn't / won't be doing anyway) or gaps near the tails of the run (which you can see a little in the Tucannon example). OTOH, the RW interpolation will be noisier, so it's a bias vs. variance tradeoff.

As for the Cholesky stuff, see the "multivariate Matt trick" section in the Stan manual; that should help. You can keep much of the data and parameter structure for the MAR(1) version the same, i.e. vectorizing the daily process errors across years. You'll just be multiplying the vector of z-scored process errors elementwise by the vector of annual SDs without the Cholesky-factored correlation matrix being involved.

The one thing I've thought of doing with phenology, is to fit a finite-mixture of 3 distributions that would characterize the daily expected level. The advantage of this is that I could then use the contribution of each mixture on each day to discretize the fish into life histories. Maybe each year would get an independent multiplier on this mixture (or each component?!) to reflect different annual run sizes (and relative contributions of different live histories!?). Thoughts?

This is actually where I started when Morgan first asked me about the Tucannon data. I quickly abandoned it, though, after running into severe identifiability issues that were a clear example of Gelman's folk theorem. The basic problem is that the phenology is waaay too complex and idiosyncratic for any simple parametric model to capture adequately. Even if you could get a sensible fit, you'd be gaining some dubious inference (proportions of LH stages) at the expense of a massive degradation in the quality of your abundance estimate.

I can imagine some sort of smooth function helping to capture the phenology, but only if it had AR(1) process errors added on top of it. On the upside, this could get around the silliness of treating the raw phenology as a stationary process. Even then, though, I would lean toward a Gaussian process or GAM rather than the finite-mixture idea.

marksorel8 commented 5 years ago

OTOH, the RW interpolation will be noisier, so it's a bias vs. variance tradeoff.

I would rather have less bias and more variance in this case I think, because this will just be one source of information on population dynamics. If I don't have enough information to explore ecological hypotheses about juvenile life-history expression though, I would be sad.

I can imagine some sort of smooth function helping to capture the phenology, but only if it had AR(1) process errors added on top of it. On the upside, this could get around the silliness of treating the raw phenology as a stationary process. Even then, though, I would lean toward a Gaussian process or GAM rather than the finite-mixture idea.

I like this idea. Without the mixture I might need another way to characterize timing/life-histories (e.g. hard break points, median migration date, post-hoc mixture) to tell an ecological story, but that is not a problem. BTW, it was interesting that the proportion of fall migrants was negatively correlated with the total number of migrants in the Tucannon. How do you explain that? The motivation for using the timing/life-history information are 2-fold; 1) There are interesting ecological stories about how it relates to habitat characteristics and density, and 2) it will affect how subsequent survival is modeled (i.e. smolt trap->adult survival ~ life-history (discrete) or emigration date(continuous)).

Would we fit a unique spline / GP for each year? or one for all years that is multiplied by an annual scalar? I'm not sure the effort would be worth the benefit of fitting a unique spline/GP for each year as opposed to a RW. That is essentially the BTSPAS approach, but it uses independent process errors. Maybe with AR(1) errors the spline/GP could be much smoother?

Thoughts: The information that could logically be shared across years is the affect of photoperiod (i.e. DOY) on a fish's propensity to move. This is undeniably a factor affecting movement probability and is essentially constant across years.

But, if DOY/photoperiod interacts with density, flow, and other year-varying factors, as I would expect, that makes it more difficult to leverage that information across years and avoid the degradation of our estimates. My idea with the finite mixture was that the mean and SD of each mixture would be the only information shared across years. A multiplier would be fit for each mixture-component and year, and then as you suggest there would be AR(1) process errors around that. But, I am in no way wedded to this. I just want the least biased most precise estimates possible.

mdscheuerell commented 5 years ago

@marksorel8: Can you add some pdf's of plots to the repo that include something like the daily detections for a few representative years? That would help me think about a good option for the process model.

From @ebuhle:

I can imagine some sort of smooth function helping to capture the phenology, but only if it had AR(1) process errors added on top of it.

I would start simple here and use a RW or AR(1) model here. An AR(1) model with phi = 0.9 would still have a lot of flexibility to fit what would appear to be a largely non-stationary process.

ebuhle commented 5 years ago

I would start simple here and use a RW or AR(1) model here. An AR(1) model with phi = 0.9 would still have a lot of flexibility to fit what would appear to be a largely non-stationary process.

I agree with this, although it's worth noting that the annual phi estimates for the Tucannon data are mostly in [0.9, 0.95] and you still see a bit of undesirable mean-reversion in the tails of the run in some years:

Regardless, though, "start simple" is good advice.

If you did eventually want to include some sort of smooth function for the mean, I agree that it would make sense to partially pool information about the phenology across years. The key would be to parameterize it such that the absolute level (i.e., total annual outmigrants) is not shared across years.

I suspect the discrete mixture approach (even two components, never mind three) is doomed computationally, in any event.

mdscheuerell commented 5 years ago

Thx for the figures, @ebuhle. I know Gavin Simpson pretty well and I'm sure he'd be happy to offer some advice on fitting GAMs or RMFs to these data. For example, there might be some meaningful way(s) to set the number of knots based on the seasonal aspects.

marksorel8 commented 5 years ago

Here are some for the Chiwawa, @mdscheuerell .

chiwawa years

I would be interested in thinking more about fitting different kinds of seasonal models. One of the things I am potentially interested in exploring is how the proportions of the different life-histories changes across years (with different densities). It seems like fitting one average seasonal smoother that gets scaled differently each year would subtly suggest that the life-history proportions are equal across years. If you have ideas about how to leverage information across years while still allowing for the flexibility to get unbiased estimates or abundance and timing, I am definitely interested. Maybe the knot number and placement is the shared information across years? But the coefficient on the basis functions varies? plus AR(1) process error? IHNI.

mdscheuerell commented 5 years ago

Looking at these plots suggests to me that some form of stochastic volatility model might worth a try, although I've never tried to fit one model to multiple ts. That said, it would seem that some of the environmental conditions that tend to synchronize year-to-year patterns in migration timing might be informative.

marksorel8 commented 5 years ago

Talking to Thomas Beuhrens, there are a number of environmental factors that seem to make fish move of stay put. I didn't take good enough notes, but I remember he was saying that cloud cover seems to be a big factor in combination with discharge, and maybe temperature too. Trying to characterize these relationships might be pretty hard, but I think could be interesting. I imagine that responses to environmental conditions differ seasonally too.

Here is daily average discharge (measured near the screw trap) overlayed on the catches. As Thomas pointed out, the catch data integrate across 24 hours or greater, but fish may all move within a few hours, so if we knew that they moved entirely between sunset and sunrise we could just use discharge within that period. I'm fairly certain it is available with hourly resolution.

My interpretation of this data is that fry and smolts are moving on the leading edge of the spring freshet. Summer parr are leaving on the tailing edge as the hydrograph approaches summer-low flows. And Fall parr are leaving with the fall rains.

The stochastic volatility model sounds interesting. I'm not attached to the idea of fitting the process model across multiple years. It seems like there is enough data to fit each independently. You wouldn't happen to have a reference that gives an overview of stochastic volatility models would you? Also, is it because the variability fluctuates seasonally that you think this type of model might work?

mdscheuerell commented 5 years ago

FYI, here's a description of SV models as implemented in the stochvol package. There are other forms as well, none of which would be hard to code in Stan.

ebuhle commented 5 years ago

A stochastic volatility model is an interesting idea. Considering one year at a time, yes, SV allows the process variance to evolve according to its own process model. In the multivariate context, things get more interesting; for example, you could allow the covariance (across years) of the errors to evolve through the season.

Last year I spent a day or two looking into multivariate stochastic volatility models on behalf of my office-mate, and frankly I found the literature fairly intimidating. There are lots of different "flavors" of such models, each with its own acronym (this goes for univariate SV too), and the math is somewhat intense. In many cases, the focus seemed to be on a factor-analytic dimension reduction for the covariance, but there was little or no attention given to modeling the level itself. Maybe this makes sense for the financial time series these models are designed for, IDK.

Stepping back, it seems like the question is whether the daily innovation variance (on the log scale, not the scale we're looking at in the plots above) actually changes enough to demand modeling, or whether it's more fruitful to focus on the level. Right?

mdscheuerell commented 5 years ago

I agree that SV models are a diverse and wacky bunch.

Stepping back, it seems like the question is whether the daily innovation variance (on the log scale, not the scale we're looking at in the plots above) actually changes enough to demand modeling, or whether it's more fruitful to focus on the level. Right?

Yes, I think this is the crux of the issue.

ebuhle commented 5 years ago

Yes, I think this is the crux of the issue.

Well, as far as the Tucannon goes, here's what it looks like. (These are posterior means of the normalized process error innovations.)

So I guess now the question is, how much change is "enough to demand modeling"? :-D

marksorel8 commented 5 years ago

I think that makes sense. Here are log(catch+1) for the Chiwawa.

Would the quality of the efficiency-trial data ever affect how we model the process, like if the capture efficiency was highly uncertain?

I guess one consideration is how much precision we gain by leveraging phenology information about the level across years and is it worth it, given how challenging it would seem. Maybe AR(1) or RW really are best. What about smoothed covariate effects (e.g. positive response to discharge (or diff of discharge) during leading edge of spring freshet and negative on trailing edge)?

marksorel8 commented 5 years ago

@mdscheuerell @ebuhle Would the logical next step for me be to fit AR(1) and RW models to my data? Maybe that would help us identify if they are sufficient or if we need a different model?

ebuhle commented 5 years ago

I think that makes sense. Here are log(catch+1) for the Chiwawa.

Bear in mind, this is not an apples to apples comparison. Both plots I've posted show estimated states, with the second one showing the normalized process error innovations (log_M_hat_z). You're plotting data, and the plot just above doesn't speak directly to the homoscedasticity (or lack thereof) of process errors.

Would the quality of the efficiency-trial data ever affect how we model the process, like if the capture efficiency was highly uncertain?

Hmm, well, it would increase the uncertainty around the states. Sharing information about p as much as possible seems like a good idea.

Another thing to consider is that replacing the Poisson likelihood with a negative binomial would have the effect of attributing more noise to the observations and less to the process, which would smooth out the latter somewhat. You could ask (using LOO) which likelihood is more consistent w/ the data.

I guess one consideration is how much precision we gain by leveraging phenology information about the level across years and is it worth it, given how challenging it would seem. Maybe AR(1) or RW really are best.

That's basically my feeling. Looking at those time series, while there are some consistent "stylized facts", the details are super erratic. I don't think you'd gain that much in terms of prediction / interpolation skill by including a smooth function of date, and any inference about LH stages, etc., could be accomplished post hoc or by other means.

marksorel8 commented 5 years ago

Thanks! This is very helpful. I will fit AR(1) models and then post plots of process errors.

I think we have discussed this before, but for clarification, I will be able to use the estimates (posteriors) or juveniles out (as priors) in stock-recruit models that also use spawner data?

Thanks so much for all your help!!!!!

ebuhle commented 5 years ago

I think we have discussed this before, but for clarification, I will be able to use the estimates (posteriors) or juveniles out (as priors) in stock-recruit models that also use spawner data?

In principle, yes. It's a straightforward application of Bayes' Theorem.

In practice, there's some question about how to implement this correctly in situations where the posterior from Model 1 is used as a prior on a quantity in Model 2 that is not a primitive parameter (e.g., declared in the parameters block in Stan) but a transformed parameter. If the transformation is nonlinear, it could require a Jacobian adjustment (see the Stan manual section on transformation and change of variables), which could be a deal-breaker in complex cases.

If worst comes to worst, it wouldn't be that hard to build a spawner-to-smolt regression model around the screw-trap observation model. The closed-loop IPM framework is where a fully integrated approach, with the RST observation component explicit, starts to get computationally daunting.

marksorel8 commented 5 years ago

Here are some process errors of the normalized log(expected migrants) from models fit to time series of subyearling catches in the Chiwawa. This is only half of the years. These are from maximum likelihood estimates in TMB.

Everything <100 is in the "fry" mode, which seems to have the most variance. Andrew Murdoch actually advised me to drop fry from the analysis, at least to start.

Do these plots suggest that the AR(1) model is insufficient?

mdscheuerell commented 5 years ago

I would have to see ACF's and QQ plots.

marksorel8 commented 5 years ago

Gotcha. Just to check, the process errors of an AR(1) with level=0 are the first difference of the latent-state vector multiplied by the autocorrelation coefficient?

The plots above were actually just the first difference of the latent state vectors.

marksorel8 commented 5 years ago

Hey @mdscheuerell, I just uploaded a pdf of diagnostic plots for the chiwawa subyearling models that I fit. It is in src\TMB.

I hope I am calculating the process error innovations correctly, as the first difference of the latent state estimates multiplied by the autocorrelation coefficient.

ebuhle commented 5 years ago

I hope I am calculating the process error innovations correctly, as the first difference of the latent state estimates multiplied by the autocorrelation coefficient.

It's both different and more complicated than that. If your AR(1) process model is

x[t] = mu[t] + phi * (x[t-1] - mu[t-1]) + v[t]

then solving algebraically for the innovation gives

v[t] = x[t] - mu[t] + phi * (mu[t-1] - x[t-1])

But plugging the MLEs of phi, mu and x into this expression recursively does not give you the MLE of v, b/c of correlations in the likelihood.

If your TMB code uses the noncentered parameterization where the errors themselves are a primitive parameter, this should be easy. That's how the Stan code is set up; I just pulled out log_M_hat_z.

EDIT: I just noticed you said mu[t] = 0. So then it simplifies to

v[t] = x[t] - phi * x[t-1]

Still not the first difference of x multiplied by phi, though.

marksorel8 commented 5 years ago

Thanks @ebuhle ! Can I plug the MLEs into that formula to get the process errors, or does the issues due to correlations in the likelihood still apply?

I discovered that TMB has a canned AR(1) distribution which seems to be super efficient. But, i don't get the process errors out of it.

I just fit a model that was settup like the one in Stan, and added a new plot of process errors to the repository.

ebuhle commented 5 years ago

There are some pretty strange patterns in there. Things like this:

Are those extremely smooth sections ones where there are no observations? Or...?

Have you tried fitting the multi-year model in Stan?

marksorel8 commented 5 years ago

Yeah, I agree those look weird. The long-smooth stretches appear to be stretches of zero catch, not missing observations. I'm not sure what is happening in 2002.

I'll try to fit these in Stan. Should I try to get rid of the across-year dependencies in the Stan model first? Or just run as is?

ebuhle commented 5 years ago

I'll try to fit these in Stan. Should I try to get rid of the across-year dependencies in the Stan model first? Or just run as is?

Try it as-is. That's what my plots above are based on.

marksorel8 commented 5 years ago

Good idea. Will do.

I added a plot of the Chiwawa subyearling catch data for all years. The weird 2002 times series was already on my radar actually, because it represents an anomalously large outmigration of fry and summer parr coming from the year with the highest redd counts (for which we have corresponding smolt-trap data). I suspect that the managers flooded the spawning grounds with hatchery fish that year, and that they spawned low (near the acclimation site) and their juveniles emigrated early. How to treat these specific data points is actually an important question, because they could have a really strong influence on the spawner-migrant relationships.

mdscheuerell commented 5 years ago

Hey @ebuhle, have you by chance tried a negative binomial in lieu of the Poisson likelihood for total catch here?

ebuhle commented 5 years ago

Hey @ebuhle, have you by chance tried a negative binomial in lieu of the Poisson likelihood for total catch here?

As far as I can remember, no. But I did suggest that earlier in this thread as a possibility worth exploring:

Another thing to consider is that replacing the Poisson likelihood with a negative binomial would have the effect of attributing more noise to the observations and less to the process, which would smooth out the latter somewhat. You could ask (using LOO) which likelihood is more consistent w/ the data.

mdscheuerell commented 5 years ago

Jeez, you'd think I'd be capable of reading the entire thread...

marksorel8 commented 5 years ago

Haha, I mean there are only 32 posts Mark! At what point do I close this issue and start a new one?

It sounds like the negative binomial is definitely worth trying. I'll experiment with it once I finish running the original. Then we can compare the results (in a new thread!). I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;)

ebuhle commented 5 years ago

I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;)

I leave the exercise to the reader...errr, dissertator.

The tricky part will be making sure you're using a parameterization of the NB that is closed under addition, so that this still holds:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L75

As you probably know, there are several parameterizations in common use, and even the "standard" one in Stan differs from the one on Wikipedia, for example. Further, the most common and useful parameterization in ecology (mean and overdispersion) is yet another variant. The catch is that closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization), so you can't simply sum the expectations like we do with the Poisson.

Then it will also be slightly tricky to figure out the binomial thinning:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L76

since thinning the NB (like the Poisson) scales the expectation. So you'll have to switch between parameterizations (i.e., solve one set of parameters for the other) at least once.

I would have to sit down with pencil and paper to figure this out. Sounds like a good grad student project, amirite??

marksorel8 commented 4 years ago

Hi @mdscheuerell @conversesj and @ebuhle, I've been working on this model and I've incorporated a number of the things that we had talked about. I'd love it if you would take a look and let me know what you think. I've written a short description of the current model ( src/Methods/screw trap model.Rmd and pdf). The estimation model itself is written in TMB (src/TMB/screw_trap_LP_3.cpp).

Thanks! Mark

marksorel8 / Wenatchee-screw-traps

Multi-year screw trap models in Stan #1