Replace Poisson likelihood for catch with negative binomial

marksorel8 commented 5 years ago

I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;)

I leave the exercise to the reader...errr, dissertator.

The tricky part will be making sure you're using a parameterization of the NB that is closed under addition, so that this still holds:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L75

As you probably know, there are several parameterizations in common use, and even the "standard" one in Stan differs from the one on Wikipedia, for example. Further, the most common and useful parameterization in ecology (mean and overdispersion) is yet another variant. The catch is that closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization), so you can't simply sum the expectations like we do with the Poisson.

Then it will also be slightly tricky to figure out the binomial thinning:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/14e67d4208d652a46a814142b9fe7004d200f711/src/Stan_demo/juv_trap_multiyear.stan#L76

since thinning the NB (like the Poisson) scales the expectation. So you'll have to switch between parameterizations (i.e., solve one set of parameters for the other) at least once.

I would have to sit down with pencil and paper to figure this out. Sounds like a good grad student project, amirite??

Originally posted by @ebuhle in https://github.com/marksorel8/Wenatchee-screw-traps/issues/1#issuecomment-523688629

ebuhle commented 4 years ago

FYI, @marksorel8, I'm currently investigating your reprex on the numerical errors in M under the juv_trap_NB.stan model. Diagnosing the problem is challenging b/c when the errors occur during sampling as opposed to warmup, and M and/or M_tot is being monitored, the summary stats throw an error and no result is returned. Further complicating things, seemingly irrelevant changes to the model in an effort to circumvent this catch-22 evidently change the state of the RNG so that the errors go away. (Not entirely sure what's up w/ that, but I suspect it has to do with floating-point weirdness, which I first learned of thanks to a frustrating Stan situation a couple years ago.) Even more baffling, simply removing M and M_tot from the list of parameters to be monitored causes the errors to go away. No idea what's going on there -- this seems to defy the basic programmatic logic of how Stan works.

From the info on neg_binomial_rng: "alpha / beta must be less than 2^29". Is it possible to extract all of the alphas and betas that the chain visited and see if alpha/beta exceeded this threshold in any iteration?

The actual errors printed in the Viewer pane are like this:

Exception: neg_binomial_rng: Random number that came from gamma distribution is 1.72119e+015, but must be less than 1.07374e+009 (in 'model2cf869f0741_juv_trap_NB' at line 103)

This refers to the method of generating pseudo-random NB variates using the gamma-Poisson mixture. AFAICT, either of the NB parameters (M_hat * beta_NB or beta_NB) could be responsible for these huge gamma RVs. We need a way to monitor these parameters when the errors are occurring to see if a pattern emerges in the "good" vs. "bad" iterations, and that in turn requires tricking the sampler as described above.

Stay tuned...

ebuhle commented 4 years ago

Aha! I think at least part of the reason I'm having such a hard time reproducing your reprex is because you didn't set the RNG seed for R, thus the inits will be different from one call to the next. I must've gotten "lucky" the first time or two yesterday. Will just have to continue with trial-and-error (as in, keep trying until I get an error).

marksorel8 commented 4 years ago

Oh man, I'm sorry @ebuhle. At first I only set the RNG in R and that didn't work, so I switched to setting it in Stan only, which i thought was enough. SMH. I think my R seed may have been 10403.

ebuhle commented 4 years ago

See https://github.com/marksorel8/Wenatchee-screw-traps/issues/3#issuecomment-548129741; I found one that "works". Better still, these seeds give two chains that are fine and one (chain 3) that appears to throw the error at the initial values and every iteration thereafter. Nothing obvious pops out from comparing the inits, though. Will have to mess around with a print statement in generated quantitites.

mdscheuerell commented 4 years ago

An aside, but perhaps worth mentioning. Once upon a time, I had a JAGS model wherein a particular seed had the rather undesirable effect of allowing the MCMC chains to get X steps along (like 75%) before barfing every time at the same iteration. No reboot, update, downdate(?), etc made a difference.

In this particular case, however, the behavior is much more pathological in that there a multiple pathways to the problem. Don't ask me how I know this, but setting really bad bounds on priors (eg, setting the lower bound on an SD prior to be much greater than the true value for the simulated data) can do wonders for the actual fitting process (eg, eliminates all divergent transitions, rapidly decreases run times), even if the answers are obviously wrong.

So, perhaps it's worth changing the prior bound(s) for p_NB (ie, the scale param for the NB) away from one or both of the current bounds to see if we can eliminate the problem? Right now the range on p_NB is [0,1].

https://github.com/marksorel8/Wenatchee-screw-traps/blob/63081dfee42927abb85c7339a6a0af5a3434e8d4/src/Stan_demo/juv_trap_NB.stan#L33

Perhaps we should try [0.2,0.8] (or something else that's biased and more precise than the current range)?

ebuhle commented 4 years ago

Once upon a time, I had a JAGS model wherein a particular seed had the rather undesirable effect of allowing the MCMC chains to get X steps along (like 75%) before barfing every time at the same iteration. No reboot, update, downdate(?), etc made a difference.

Haha, even I remember this; that's how annoying it was!

So, perhaps it's worth changing the prior bound(s) for p_NB (ie, the scale param for the NB) away from one or both of the current bounds to see if we can eliminate the problem? Right now the range on p_NB is [0,1].

Not a bad idea, although I'd suggest something without the un-Gelmanian hard endpoints inside the feasible domain, like Beta(2,2). I do have the sneaking suspicion that p_NB is the culprit, but I was hoping to get some more decisive evidence. I didn't think it would be this difficult, though.

ebuhle commented 4 years ago

OK, so a bit of print-statement sleuthing suggests that the problem may be in both M_hat and beta_NB.

https://github.com/marksorel8/Wenatchee-screw-traps/blob/9035486c5e5dd4d564511f5fc4ad4128cf6d60a4/src/Stan_demo/juv_trap_NB.stan#L100-L103

Typical output in cases that produce the error:

Chain 3: M_hat[1] = 1.31254e+014 beta_NB = 9.99201e-016 M_hat[2] = 1.31131e+014 beta_NB = 9.99201e-016

Chain 3: Exception: neg_binomial_rng: Random number that came from gamma distribution is 3.83462e+011, but must be less than 1.07374e+009 (in 'model2a1471a922e3_juv_trap_NB' at line 102)

Compare that to, for example:

M_hat[53] = 1058.57 beta_NB = 0.311853 M_hat[54] = 1047.58 beta_NB = 0.311853 M_hat[55] = 1316 beta_NB = 0.311853 M_hat[56] = 312.761 beta_NB = 0.311853

It's too bad the "canonical" priors that would regularize p_NB away from 0 and 1 (like the aforementioned Beta(a,b) where a and b are slightly greater than 1) are also fairly informative about the bulk. We could use something like the Subbotin distribution, as we do in salmonIPM, but it's hacky.

It's less obvious how to prevent M_hat from blowing up, but maybe tighter priors on the log-scale regression coefs could help:

https://github.com/marksorel8/Wenatchee-screw-traps/blob/9035486c5e5dd4d564511f5fc4ad4128cf6d60a4/src/Stan_demo/juv_trap_NB.stan#L77

Anyway, I'm just leaving this update here for now; I'm out of time to spend on this today.

marksorel8 commented 4 years ago

Thank you so so much for looking into this @ebuhle. I will try some different priors and report back!

marksorel8 / Wenatchee-screw-traps

Replace Poisson likelihood for catch with negative binomial #3