Replace Poisson likelihood for catch with negative binomial

marksorel8 commented 5 years ago

I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;)

I leave the exercise to the reader...errr, dissertator.

The tricky part will be making sure you're using a parameterization of the NB that is closed under addition, so that this still holds:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L75

As you probably know, there are several parameterizations in common use, and even the "standard" one in Stan differs from the one on Wikipedia, for example. Further, the most common and useful parameterization in ecology (mean and overdispersion) is yet another variant. The catch is that closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization), so you can't simply sum the expectations like we do with the Poisson.

Then it will also be slightly tricky to figure out the binomial thinning:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L76

since thinning the NB (like the Poisson) scales the expectation. So you'll have to switch between parameterizations (i.e., solve one set of parameters for the other) at least once.

I would have to sit down with pencil and paper to figure this out. Sounds like a good grad student project, amirite??

Originally posted by @ebuhle in https://github.com/marksorel8/Wenatchee-screw-traps/issues/1#issuecomment-523688629

ebuhle commented 5 years ago

My contribution is to make the issue title informative.

mdscheuerell commented 5 years ago

I agree that it would be good for @marksorel8 to take a crack at the NB formulation.

mdscheuerell commented 5 years ago

My contribution is to make the issue title informative.

And I just added an informative label to the issue!

ebuhle commented 5 years ago

Good job, team!

marksorel8 commented 5 years ago

Alright, here goes.

closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization)

The Stan function neg_binomial_lpmf(y | alpha, beta) is similar to the the standard paramaterization provided here, except the binomial coefficient part is different and [beta/ (beta+1)] replaces p. The mean of the Stan distribution is alpha/ beta, and the variance is [alpha/(beta^2)] (beta+1)*.

Since the Stan parameterization is similar to the standard parameterization in the part with p, would it then be closed to addition as long as we fit a single beta (and therefore p) parameter for each year?

alpha = expected value beta*. So, can we just replace

https://github.com/marksorel8/Wenatchee-screw-traps/blob/6d93b8738a73bf75edd34acf1e2462a2b5beda23/src/Stan_demo/juv_trap_multiyear.stan#L117

with C~neg_binomial_lpmf(C_hat*alpha, beta)?

mdscheuerell commented 5 years ago

If you want C ~ neg_binomial(alpha, beta) then I think you need

beta <- alpha / C_hat

marksorel8 commented 5 years ago

@mdscheuerell , would that make the distribution not closed to addition, because the betas are different?

The closed to addition thing isn't a problem in the Wenatchee because they are legally obligated to check the traps daily, but it seems like it is important to crack, especially since it will be necessary in other places like the Tucannon.

mdscheuerell commented 5 years ago

I don't understand the issue re: alpha and beta in the negative binomial complicating the estimation. The mean (and variance) for the Poisson (C_hat) is indeed a function of p[trap_year,trap_week], but that calculation is done prior to its inclusion in the likelihood. The negative binomial can also be parameterized in terms of the mean and variance, which means using the same estimate of C_hat.

ebuhle commented 5 years ago

There are two distinct steps in the likelihood calculation where going from the Poisson to the NB presents complications.

The one @marksorel8 is referring to is the summation of "true" migrants over multiple days, if the trap isn't emptied daily (though apparently that's a non-issue in the Wenatchee). This is implicitly a summation of discrete-valued latent states, but when they're Poisson-distributed it's trivial b/c you can just sum the expectations and it's still Poisson. With the NB, this doesn't work -- the sum of NBs is NB iff they have the same p, in which case you sum the r's (in the standard parameterization). So you'll have to (1) declare p (or a one-to-one transformation of it) as a primitive parameter, which is less than ideal for interpretation and for characterizing the overdispersion; (2) after calculating the daily expectation M_hat_t, use the fixed p to solve for r[t]; (3) sum the r[t]'s across days; and then (4) convert back to the expectation M_hat_cumsum for subsequent likelihood calcs.

The second issue is the binomial thinning in calculating C_hat. If we were using the (mean, overdispersion) parameterization, this would be simple: thinning scales the NB mean just like it does the Poisson. Here, though, we'll have to either solve for the overdispersion to use the mean directly, or solve for r again (after thinning) to use the standard parameterization.

mdscheuerell commented 5 years ago

I had made the assumption that we had a distributional form for the expected catch for each trap, which was initially Poisson. The expectation (and variance) for the Poisson comes from a Gaussian AR(1) model for log-migrants, which is then thinned at some interval based upon when the trap is sampled. I had thought p was indeed constant within an interval, and hence the NB should be OK because we could parameterize it in terms of a mean (fixed) and r (unknown).

ebuhle commented 5 years ago

I had made the assumption that we had a distributional form for the expected catch for each trap, which was initially Poisson.

What I had in mind in formulating the model was that it's actually the daily arrivals at the trap that are Poisson. The daily outmigrants available for capture are a Poisson-log AR(1) process, which are then possibly accumulated over multiple days and thinned by binomial sampling with trap efficiency p (uhh, the other p...that's confusing). In JAGS, you would model the daily arrivals explicitly:

M[t] ~ dpois(M_hat[t])

We can't do that in Stan, but luckily the Poisson is closed under addition and binomial thinning, so everything works out and we get to have our discrete-latent-state cake and eat it too (in generated quantities).

Things aren't quite as elegant if the daily arrivals are NB. It's still doable as outlined above, but I suspect having to parameterize in terms of p (vs. overdispersion) induces a wonkier prior that's harder to interpret (since p is involved in both the mean and overdispersion).

mdscheuerell commented 5 years ago

I suspect having to parameterize in terms of p (vs. overdispersion) induces a wonkier prior that's harder to interpret (since p is involved in both the mean and overdispersion).

I agree with this.

mdscheuerell commented 5 years ago

FYI, I did something similar for the San Nicolas data wherein the expectation for the NB was based on the log-abundance from a MAR(1) model, but the observations weren't nearly as sparse (ie, NA's were very rare), so no summations were necessary and solving for r wasn't such a hassle.

ebuhle commented 5 years ago

@marksorel8, you could explore the difference b/w the two parameterizations with the Wenatchee data, since summation over missing days isn't an issue.

marksorel8 commented 5 years ago

Hi @mdscheuerell @ebuhle, I have started adding the ability to use the NB distribution for the observations in the single year Stan model. I am using the mu and phi parameterization of the NB. https://github.com/marksorel8/Wenatchee-screw-traps/blob/6d9a311a1805f294da35a471d4081134cc3e22ad/src/Stan_demo/juv_trap.stan#L90 Would you be able to check the changes that I made in juv_trap.stan in the last commit and let me know if I am on the right track? What would an appropriate prior on the phi parameter in the NB be?

I am not getting radically different results with the NB than the Poisson at this point. Should I us LOO to compare them once I have a descent prior on phi? How would I go about that?

Finally, I found an e-book that might be of use in this project. Weisse- An Introduction to Discrete-Valued Time Series.pdf The first chapter and specifically section 2.1.3 on pg 22 seem relevant, and there is some included Rcodes.zip. The password to access the Rcodes is "DiscrValTS17". NB-INAR1.r might have some useful reparamitization algebra.

mdscheuerell commented 5 years ago

@marksorel8 The changes you made are not sufficient because you're using the primitive parameter phi_obs directly in the NB likelihood without using p to solve for r prior to summing r over t. Here's the pseudo-code from @ebuhle

declare p as a primitive parameter (right now it's in transformed parameters);
after calculating the daily expectation M_hat_t, use the fixed p to solve for r[t];
sum r[t] across days; and
convert back to the expectation M_hat_cumsum.

mdscheuerell commented 5 years ago

Re: models for discrete-valued data, you could opt to model the true, but unknown mean number of migrants as an AR(1) process for discrete values, but it's just as reasonable to assume the log-mean is actually real-valued (Gaussian), which is just fine for the Poisson or NB.

ebuhle commented 5 years ago

@mdscheuerell pretty much covered it. The key point is that you can't use the neg_binomial_2 parameterization as long as you need to sum expected migrants over multiple days,

https://github.com/marksorel8/Wenatchee-screw-traps/blob/6d9a311a1805f294da35a471d4081134cc3e22ad/src/Stan_demo/juv_trap.stan#L59-L60

because the NB is only closed under addition if all the terms are distributed with the same p (i.e., in Stan's neg_binomial version, the same beta) and the only way to enforce this is to declare p (i.e., beta) as a primitive parameter.

However, if you don't do the summation (which you can get away with in the Wenatchee case) then either parameterization is fine, and this would allow you to compare the two. As for a prior on the overdispersion, I'd suggest a half-normal or half-Cauchy, diffuse enough to cover plausible values (recalling that smaller phi means more overdispersion).

ebuhle commented 5 years ago

Oh, and just to clarify:

1. declare `p` as a primitive parameter (right now it's in `transformed parameters`);

That's a different p -- the weekly capture probability of the trap, not the binomial probability underlying the "number of failures before first success" interpretation of the NB -- and it's fine as is. Not confusing at all...

marksorel8 commented 5 years ago

Thanks @mdscheuerell and @ebuhle . I added a prior on the dispersion parameter (actually, I put the prior on the inverse of the square root of the dispersion parameter) and played around with fitting a few different years of the Chiwawa data with the NB and the Poisson.

For one thing, I am getting some divergences with the NB.

Beyond that, the degree of overdispersion depends on the year. In some years it is negligible and in others it is more substantial. The year shown below is one with more overdispersion.

Negative Binomial

Poisson

marksorel8 commented 5 years ago

Here is an example where the NB seems much more appropriate. The overdispersion seems important for fry, which are logically observed with more error.

Poisson

Negative Binomial

mdscheuerell commented 5 years ago

Correct me if I'm wrong, @marksorel8, but it looks to me like you still haven't changed the Stan code to address the proper thinning of the NB per our discussion above (i.e., you are still summing over weeks with different p's).

marksorel8 commented 5 years ago

You are correct that I haven't changed the Stan code, @mdscheuerell, but I am not summing over weeks with different p's (in the Wenatchee example). There is no summing over weeks or days in the Wenatchee data because each catch observation represents a single day. So, the NB Stan model is currently appropriate for the Wenatchee data but not the Tucannon data (where the trap is fished for consecutive days without checking).

Addressing the proper thinning of the NB is the next step, but I wanted to compare the NB and the Poisson with the Wenatchee data first because it was simple. Looks like the NB is a big improvement for sure. So, good idea!

ebuhle commented 5 years ago

So, the NB Stan model is currently appropriate for the Wenatchee data but not the Tucannon data (where the trap is fished for consecutive days without checking).

Except the code still includes the summation, which the Wenatchee data doesn't need (b/c elapsed_time[i] is always equal to 1). In general, it's not advisable to write code that produces an invalid model unless the data meet certain conditions, without checking or warning about violations of those conditions -- i.e., it fails silently.

The simplest solution in this case would be to make a separate version of juv_trap.stan that ditches the summation and uses the neg_binomial_2 parameterization. That would also facilitate model comparison.

marksorel8 commented 5 years ago

Here I tried to implement the first parameterization of the NB in Stan (y~neg_binomia(alpha, beta)), which would theoretically be closed to addition, but I'm getting very different answers than when I used the other (mu and phi) parameterization, as well as some menacing warning messages. Do you see the problem(s)? I'm stumped.

I'll work on reverting the original single-year Poisson model and adding a separate neg-binomial_2 version without the summation. What is the easiest way to revert to an earlier commit in GitHub?

mdscheuerell commented 5 years ago

It sounds like you want git reset --mixed BLAH where BLAH should be replaced by the SHA of the commit you'd like to return everything to. If you are unsure about this, I'd suggest making a copy of the file(s) and placing them elsewhere before resetting.

marksorel8 commented 5 years ago

I just realized that with the alpha beta paramaterization neg_binomal it is actually a different model, so I shouldn't expect to get the same results as with the mu phi paramaterization neg_binomial_2.

The menacing warnings are troubling though. I guess the question is if I implemented the NB with the summing correctly?

ebuhle commented 5 years ago

@marksorel8, I'm planning to take a look at this, but I may not get to it until later this week/end.

marksorel8 commented 5 years ago

Sounds good @ebuhle , I'll try to fix things up in the repo in the next couple days. Currently in Friday Harbor for 558 workshop.

ebuhle commented 5 years ago

OK, so I ran juv_trap_NB.stan for the 2013 Wenatchee data as loaded in juv_trap_Stan_script_Wenatchee.R. (It seems like the data-munging code could probably be cleaner, but I left it as-is.) I got the following warnings:

Warning messages: 1: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low 2: Examine the pairs() plot to diagnose sampling problems 3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess 4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess

The website indicated explains what these warnings mean and gives links to references for further information. Basically, the sampler is not fully exploring the posterior (hence the warning about BFMI, also reflected in the Energy diagnostic tab in Shinystan) and the "new" ESS diagnostics cast doubt on the precision of both the location and tail-quantile estimates (consistent with the low ESS for specific parameters in the summary output).

Looking at the Stan code, the main problem I see is that the prior on beta_NB does not make sense:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/f49657662ce34c45991499a99394553a9e1679fc/src/Stan_demo/juv_trap_NB.stan#L80

To explain why, let's establish some less-confusing notation. Denote the "standard" NB as

Y ~ NB(a,q) P(Y = y) = f(y|a,q) = choose(y + a - 1, a - 1) (1 - q)^a q^y

and Stan's neg_binomial parameterization as

Y ~ NB(a,b) P(Y = y) = g(y|a,b) = choose(y + a - 1, a - 1) (b/(b + 1))^a (1/(b + 1))^y

So b = (1 - q)/q and log(b) = logit(1 - q) = -logit(q). That is, b is the odds of "failure" in the standard interpretation of the NB. As such, a prior on b should put non-negligible mass on values up to ~exp(4) if it is to allow values of q approaching 0. Comparing the prior (gray line) and posterior (histogram) of beta_NB aka b suggests that the prior might be constraining values to the right of the mode somewhat:

More generally, though, using b as a primitive parameter is problematic because b goes to Inf as q approaches 0. An obvious and potentially better-behaved alternative would be to declare q as a primitive parameter, bounded on (0,1) to imply a uniform prior, and then transform to b to use Stan's neg_binomial. You could do this directly in the sampling statement:

C ~ neg_binomial(C_hat*(1 - q)/q, (1 - q)/q);

although theoretically it would be a teeny bit more efficient to declare b in the model block:

model {
  real b = (1 - q)/q;
...
  C ~ neg_binomial(C_hat*b, b);
...
}

One additional point of clarification: I realized I made things unnecessarily complicated when I wrote upthread (parameter names have been changed to match the notation above)

(2) after calculating the daily expectation M_hat_t, use the fixed q to solve for a[t]; (3) sum the a[t]'s across days; and then (4) convert back to the expectation M_hat_cumsum for subsequent likelihood calcs.

If Y1 ~ NB(a1,q) and Y2 ~ NB(a2,q), then

Y = Y1 + Y2 ~ NB(a1 + a2,q) E(Y) = (a1 + a2)*q/(1 - q) = E(Y1) + E(Y2).

In other words, it's fine to sum the expectations (and thin) and then solve for a and b, so this part is still valid as-is IFF q is fixed:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/4967ac0a5e7b28c2b8e9d23eeabc3ea8bb5d159e/src/Stan_demo/juv_trap_NB.stan#L59-L61

One last thought for now: I'd suggest making separate .stan files for these alternative likelihood formulations (Poisson, NB1, NB2) rather than passing in a flag like Use_NB. Or at the very least, I'd suggest not using that approach to switch between Poisson and NB2. Recall that the only purpose of the NB2 formulation was to allow a comparison with NB1 (the one discussed above) in the restricted case where the trap is sampled daily, so no summation is needed. That is, this was the point:

with the alpha beta paramaterization neg_binomal it is actually a different model, so I shouldn't expect to get the same results as with the mu phi paramaterization neg_binomial_2.

But we can't use NB2 for the more general case, b/c q will vary across days. Hence my earlier comment:

Except the code still includes the summation, which the Wenatchee data doesn't need (b/c elapsed_time[i] is always equal to 1). In general, it's not advisable to write code that produces an invalid model unless the data meet certain conditions, without checking or warning about violations of those conditions -- i.e., it fails silently.

The simplest solution in this case would be to make a separate version of juv_trap.stan that ditches the summation and uses the neg_binomial_2 parameterization. That would also facilitate model comparison.

marksorel8 commented 5 years ago

This is great @ebuhle , thank you! I agree that there there should be different .stan models for each distribution, and will work on that after AFS. Will edit the neg_binomial model per your suggestions too.

marksorel8 commented 5 years ago

We now have three different single-year Stan models using the three different observation likeliehoods: Poisson, Negative Binomial (alpha, beta), and Negative Binomial (mu, phi). I have lots of questions moving forward :)

How do we want to do model comparison/selection?
Should we revive the discussion of what process model(s) to consider (in a new thread)?
Would we ever consider using different observation models for different years?
Does this model seem like it is worth a stand-alone publication in a journal like CJFAS? or better as a component of a larger paper integrating it with other data (e.g. redd counts) and telling more of an ecological story?

Fits with the different models to an example Chiwawa year

ebuhle commented 5 years ago

A few quick thoughts:

1. How do we want to do model comparison/selection?

LOO!

2. Should we revive the discussion of what process model(s) to consider (in a new thread)?

Sure, pending answer to (1).

3. Would we ever consider using different observation models for different years?

My vote is no, especially if we're still contemplating sharing some observation-related parameters among years.

4. Does this model seem like it is worth a stand-alone publication in a journal like CJFAS? or better as a component of a larger paper integrating it with other data (e.g. redd counts) and telling more of an ecological story?

Seems like the main thrust of a methods-oriented paper would be to compare this to existing approaches, in particular BTSPAS. Whether that's worth doing depends largely on your level of interest, but also I don't think our model is quite up to the task yet; BTSPAS does several things (esp. related to the efficiency trials) that ours doesn't.

marksorel8 commented 5 years ago

LOO!

sounds good...but looks hard. Are you available to sit down and talk about how to implement this? Or do you have an example?

Would we do LOO for the catch data only?

ebuhle commented 5 years ago

sounds good...but looks hard.

Really, even after reading the vignettes?

The theory is subtle (though not that much worse than AIC / DIC / *IC, except for the computational part) but the loo package makes it dead simple in practice. The only change needed to the Stan code would be a few lines in generated quantities to calculate the pointwise log-likelihoods for catch and recaptures, so that you can monitor them and pass the result to loo(). If you get stuck on this, I can help out.

Would we do LOO for the catch data only?

While that's not necessarily wrong (there are situations where you're only interested in the predictive skill of one part of a model, or for one type of data), here I'd recommend using the total likelihood.

marksorel8 commented 5 years ago

Ok, I'll give it a try, since it sounds like you think I can do it. I got thrown off by some of the more complicated examples in the vignette.

a few lines in generated quantities to calculate the pointwise log-likelihoods for catch and recaptures

This sounds like a log_lik vector with a pointwise log-likelihood value for each data point (so length of data set #1 + length of data set # 2). Let me know if I'm on the wrong track here. I'll give this a shot though :)

ebuhle commented 5 years ago

This sounds like a log_lik vector with a pointwise log-likelihood value for each data point (so length of data set #1 + length of data set # 2). Let me know if I'm on the wrong track here. I'll give this a shot though :)

Ultimately, yeah. For readability's sake I'd probably declare two vectors,

generated quantities {
  vector[N_MR] LL_MR;
  vector[N_trap] LL_trap;

  for(i in 1:N_MR) {
    // evaluate mark-recap log-likelihood for obs i
  }

  for(i in 1:N_trap) {
    // evaluate catch log-likelihood for obs i
  }
}

Then you could either concatenate them into one long vector, or monitor them both and cbind() them together. The latter might make it easier to keep track of which likelihood component is which.

marksorel8 commented 5 years ago

I attempted to compare the Poisson and NB for the 2001 Tucannon data. I got the following warnings after running the NB model.

Warning messages: 1: In validityMethod(object) : The following variables have undefined values: M[1],The following variables have undefined values: M[2],The following variables have undefined values: M[3],The following variables have undefined values: M[4],The following variables have undefined values: M[5],The following variables have undefined values: M[6],The following variables have undefined values: M[7],The following variables have undefined values: M[8],The following variables have undefined values: M[9],The following variables have undefined values: M[10],The following variables have undefined values: M[11],The following variables have undefined values: M[12],The following variables have undefined values: M[13],The following variables have undefined values: M[14],The following variables have undefined values: M[15],The following variables have undefined values: M[16],The following variables have undefined values: M[17],The following variables have undefined values: M[18],The following variables have undefine [... truncated] 2: There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 3: Examine the pairs() plot to diagnose sampling problems 4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess

Not sure what the deal with the undefined values is. The low ESS's were for the process error sigma and the NB overdispersion p, which would logically be negatively correlated. R-hats were still 1.01 for those parameters after 1500 iterations though.

When I ran loo for the Poisson model I got the following warnings:

1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

2: In log(z) : NaNs produced
3: In log(z) : NaNs produced

and this output:


         Estimate   SE
elpd_loo   -591.6 23.9
p_loo       136.9  7.3
looic      1183.2 47.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     47    21.0%   325       
 (0.5, 0.7]   (ok)       66    29.5%   88        
   (0.7, 1]   (bad)      93    41.5%   13        
   (1, Inf)   (very bad) 18     8.0%   5         
See help('pareto-k-diagnostic') for details.

They were better for the NB model, but still not great.

> print(loo_NB)

Computed from 3000 by 224 log-likelihood matrix

         Estimate   SE
elpd_loo   -664.9 28.2
p_loo        96.9  6.8
looic      1329.8 56.4
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     103   46.0%   132       
 (0.5, 0.7]   (ok)        71   31.7%   17        
   (0.7, 1]   (bad)       45   20.1%   12        
   (1, Inf)   (very bad)   5    2.2%   3         
See help('pareto-k-diagnostic') for details.

mdscheuerell commented 5 years ago

I took a quick glance at your code and you're not asking for the pointwise likelihoods of the data (catch), but rather an estimated state (migrants), so these comparisons via loo() aren't really germane anyway.

marksorel8 commented 5 years ago

Hi @mdscheuerell , I forgot to push the changes that I made to the code last night. Doh! sorry. I am asking for the pointwise log_likelihood of the data with the following code, right?

 vector[N_MR] LL_MR;  //pointwise log-likelihood values for Mark Recapture data
  vector[N_trap] LL_trap; // pointwise log-likelood values for trap catch data

  for(i in 1:N_MR) {
    LL_MR[i] = binomial_lpmf( recap[i] | mark[i], p[MR_week[i]]); // evaluate mark-recap log-likelihood for obs i
  }

  for(i in 1:N_trap) {
    LL_trap[i] = neg_binomial_lpmf(C[i] | C_hat[i], phi_obs); // evaluate catch log-likelihood for obs i
  }

apologies for not pushing the code I was referencing last night.

marksorel8 commented 5 years ago

Might we need to do some of the fancy stuff described here for the catch time series data, @ebuhle ?

ebuhle commented 5 years ago

Might we need to do some of the fancy stuff described here for the catch time series data, @ebuhle ?

Good Q, but that is only necessary if you're trying to do step-ahead forecasting, which is not our scenario.

I still need to dig into the Tucannon results you posted above and understand where those numerical errors in the generated quantities M[t] are coming from. Presumably the issue is under- / overflow either in (1) calculating b (per my proposed notation), which you're still calling beta_NB, or (2) evaluating the PMF for the neg_binomial_rng function. When I tried it, I didn't get those errors, so it depends on what regions of parameter space the chains visit.

I'm also not clear on why you were using Tucannon data for the model comparison, since that prohibits the use of the NB2 model. I thought the idea was to investigate the differences b/w the NB (awkward and nonintuitive, but usable with gappy data) and NB2 (more appealing, but unusable when gaps are present) parameterizations using data that can accommodate both.

marksorel8 commented 5 years ago

Yep, not sure which of these two reasons caused the errors, but I agree those seem like the likely culprits. I see that for neg_binomial_rng the expected value must be less than 2^29, so maybe the sampler occasionally visits values that high? I also only get those errors sometimes, so maybe it's not a big deal?

Also, good Q why I fit the Tucannon data. I guess I thought it might be interesting because I assumed it had MR data for most weeks. I've been playing around with the Chiwawa data too. Lot's of bad Pareto K diagnostic values, but if loo_compare is to be believed, it appears to prefer the Poisson to either NB for 2013. I need to read up and try to better understand what LOO is doing so I can assess how serious those bad diagnostics are.

marksorel8 commented 5 years ago

Perhaps we would have better luck with K-fold than leave one out?

ebuhle commented 5 years ago

I need to read up and try to better understand what LOO is doing so I can assess how serious those bad diagnostics are.

Good idea. You will learn that those high Pareto-k values are (very) bad. Basically they mean that the corresponding pointwise likelihoods are so long-tailed that their importance sampling distribution cannot be fit by the Pareto smoothing distribution -- they may not have finite variance or even finite mean -- and therefore the approximation to the leave-one-out predictive density underlying LOO is invalid. This isn't too surprising, TBH. I expect @mdscheuerell would agree that ecological models in the wild have bad (or very bad) Pareto ks more often than not. The loo package provides some nice diagnostic plots that can help you understand which observations or data types (e.g., catch vs. recaptures) have these extremely skewed likelihoods.

In principle, the solution is to do brute-force cross-validation for the problematic observations. In practice, and depending on how much you care about the answer, re-fitting the model that many times may or may not be worth it. An alternative, as you suggest, is to abandon LOO altogether and do K-fold CV, which is still a heckuva lot more work than LOO. (Just to be clear, brute-force leave-one-out is a special case of brute-force K-fold CV with K = N. What you've been doing so far is neither of these: LOO is an approximation to the leave-one-out posterior predictive density. In that narrow sense it is analogous to AIC, DIC, WAIC, etc.)

In any event, I think we have to root out what is giving those numerical errors when converting from (mu,q) to (a,b) (again, using my proposed notation). If the NB RNG is throwing errors, presumably the likelihood is unstable as well, which could certainly have something to do w/ the extreme skewness.

I'm also not crazy about the prior on the inverse square-root of the overdispersion parameter in the NB2 formulation, but that's another post.

marksorel8 commented 5 years ago

we have to root out what is giving those numerical errors when converting from (mu,q) to (a,b) (again, using my proposed notation). If the NB RNG is throwing errors, presumably the likelihood is unstable as well, which could certainly have something to do w/ the extreme skewness.

From the info on neg_binomial_rng: "alpha / beta must be less than 2^29". Is it possible to extract all of the alphas and betas that the chain visited and see if alpha/beta exceeded this threshold in any iteration?

What do we do if the likelihood is unstable? Can we constrain it with priors? Or is there something about the model itself that would need to be changed?

Thanks!

ebuhle commented 5 years ago

https://github.com/marksorel8/Wenatchee-screw-traps/blob/63081dfee42927abb85c7339a6a0af5a3434e8d4/src/Stan_demo/juv_trap_NB.stan#L101-L105

Assuming you're already monitoring beta_NB (that is, b in my notation) and/or p_NB (q), you would just need to also monitor M_hat to figure out what's going on. Depending on what it is, it's possible that more constraining priors might help, yes. Although hopefully not on q, because the simplicity of putting a uniform prior on the probability is one of the nice things about this parameterization.

Like I mentioned yesterday, if you're going to experiment with this I'd suggest saving the random seed so we can reproduce the errors.

marksorel8 commented 5 years ago

Ok, just added a reproducible example, with the seed set for the Stan models. I also added a model using that neg_binomial_2_log distribution that used the log(mean), thinking it might be less susceptible to over- or underflow. I think it was probably underflow causing the warnings. This paramaterization does seem to eliminate those warnings (Undefined values of M[x]).

Are there tests that we should conduct to evaluate the stability of the likelihood (e.g. evaluating the effect of seed on result)?

For comparing observation models (poisson vs. NB1 vs NB2), it occurs to me that we should probably use the full multi-year model. I envision this model having independent M_hat trajectories but some shared parameters in the model of trap-capture-efficiency. Does that make sense?

mdscheuerell commented 5 years ago

@marksorel8 Can you produce some violin plots of the pointwise likelihoods? IME, they will give you an indication of where, exactly, the long tails are arising and giving the lousy pareto-K estimates. For example, here is a case from our Skagit steelhead model where tmp_lp is an num_mcmc_samples by num_likelihood_elements matrix.

## base plot
tidy_lp <- tidyr::gather(data.frame(tmp_lp), key = time, value = lp)
pp <- ggplot(tidy_lp, aes(x=time, y=lp)) +
  labs(title = "", x = "Data element", y = "Pointwise likelihood") + 
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank())
## violin plot
p1 <- pp + geom_violin()
p1

It's pretty obvious that there are 2 different contributions to the overall likelihood, and that the first (age composition) is behaving pretty badly compared to the second (escapement).

marksorel8 commented 5 years ago

Hi @mdscheuerell , the first plot below is of the pointwise likelihoods of the 15 mark-recapture efficiency trials for the Chiwawa in 2013 and the second plot is for 50 of the 252 catch observations. Looks like there are long tails in the pointwise likelihoods for all the data points.

For the actual analysis I will use a multi-year model, which combines information from mark-recapture trials ac cross years. It would be great if the multi-year model behaved better, but that may be wishful thinking?

marksorel8 / Wenatchee-screw-traps

Replace Poisson likelihood for catch with negative binomial #3