specifying appropriate weakly-informative priors

rmsa_glmer_zinb2i_PriorPosterior_plots

Above are histograms of posterior probability distributions overlain by curves of the corresponding prior distributions for parameters included in a model of seedling counts regressed on photosynthetically active radiation, patch and their interaction. Clearly, my thought process about specifying priors needs some tuning. I'm going to try to articulate what I think is wrong with each. Your corrections, feedback and instruction are most welcome!

b_intercept: the prior is mis-specified, but appears not to have strongly influenced the posterior? I more or less assigned priors based on what I was seeing in posts on the stan forum regarding weakly informative priors. Here's a little more thinking about my expectations for seedling counts. I'm trying to imagine what a reasonable expected mean and standard deviation would be for the pooled data (though I'm aware of trends by patch, e.g., really low counts [0s and 1s] in the bog patch, much higher and more variable counts [up to 200] in the forest patch, and intermediary counts in the bog forest patch). Thus, perhaps my expected mean is 10 seedlings, with a higher sd, so like 15. But, I need to be working on the log scale, so ln(10) = 2.3, ln(15) = 2.7, so my prior could be N (2.3, 2.7). Does this make sense? When I plot it, it's very wide, but not totally flat and it overlaps with the range of the posterior.

b_zi_intercept: I’m don’t think that I truly understand how to interpret this parameter. I know that ZINB models include a negative binomial count component (which allows for counts of zero, unlike hurdle models which are zero-truncated), but also a binomial component – the zi component -- that models just the probability of excess [false?] counts of zero. I’m confused by the sign of the posterior parameter estimate for the binomial intercept (b_zi_intercept) and how to relate it to the estimate for the count intercept (b_Intercept). Do I exponentiate the estimate for b_zi_intercept? If so, exp(-6.77) = 0.00115. Is this the median probability of excess/false zeros in the population-level seedling counts? Regarding the prior: as specified, it looks like it cuts off much of the posterior, but I’m not sure whether/how to better specify this prior.

b_log_PAR1.3.rs: this prior looks ok – it’s wide, but not flat and doesn’t cut off the tails of the posterior.

b_patchBogForest and b_patchBog: the prior on these parameters seems too informative and could/should be widened to N(0, 1)? Can I assign a different prior for each level of patch? Would I want to, knowing that there are way more counts of zero in the Bog patch? It seems the posterior reflected that even with the tighter priors.

b_log_PAR1.3.rs:patchBogForest and b_log_PAR1.3.rs:patchBog: I have the same concerns and questions about these as for the patch parameters. Should the priors be wider?

b_zi_patchBogForest and b_zi_patchBog: help!

shape: the posterior distribution of shape looks relatively normal/bell-shaped. Is that because the exponential prior, as specified, is having little effect on the posterior? If I were to specify a prior of gamma(1, 0.5), then the probability that shape is 0 increases, which would represent more overdispersion, which I want to limit. So perhaps the relatively flat prior gamma(1, 0.1) is what I want?

What would you be worried about seeing?

Hm, I would look for any systematic patterns in the residuals around the marginal effect curve within each patch that might suggest the relationship is being unduly influenced by or extrapolated from other sites. But then, the interaction allows for path-specific slopes so I guess this is actually more important in a model without the interaction.

Oh good[y goody gumdrops].

This came in the mail today. The back cover is hilarious.

Also, if you really expect a unimodal Hutchinsonian relationship, with the mode somewhere around DWT = 0, then that gets tricky because pretty much the only observations that could inform the "left" limb of the curve (whether specified as a quadratic or a spline, etc.) are in Bog.

I went to bed and woke up angsting again/more about whether/how to investigate the relationships between DWT and seedling counts. My thoughts seem to be circular, not getting me anywhere.

I'm going to try to step back and articulate what's been driving this project. I started by observing that burned sites once dominated by P. uviferum trees are now dominated by sphagnum mosses and seem to have very few seedlings (especially those on the shorter/younger side; e.g., under 30 cm) even though seed-bearing trees may be present surrounding the site or scattered on the site. These post-fire patches resemble non-forested, "unaltered" peatlands dominated by sphagnum. Unaltered peatlands sometimes occur adjacent to old-growth P. uviferum forests and there are sharp spatial boundaries that separate them. I want to know whether similar factors (e.g., light and water levels, the percent cover of different substrates that may serve as seed beds or growth mediums) may distinguish tree vs. sphagnum dominated patches at the burned site and control site and whether/how those factors are associated with P. uviferum seedlings (since P. uviferum is the keystone forest species).

Before looking at the data, wandering around the Burned Forest patch, in particular, I would have asked: Am I seeing so few seedlings here because seedlings will desiccate on seedbed/growth substrates that are some threshold of "far" from the water table (and this patch is characterized by a relatively tall lawn of sphagnum)? It seems that desiccation with increasing distance from the water table should hold across patches, except perhaps in the Forest where the canopy cover and density of tall shrubs could maintain a moister microclimate (including the saturation of moss cushions). . . . I don't know whether I should be examining the relationship between P. uviferum seedlings and DWT using pooled data or by patch. Patches are characterized by different ranges of DWT and different abundances of seedlings. . . .Does any of the above or the data below alter your suggestions about what I should do with DWT?

A few half-baked thoughts from someone who knows basically nothing about this system:

If forest and bog represent alternative stable states, with fire switching from the former to the latter, then there must be feedback processes that maintain suitable conditions for tree regeneration under forest canopy as well as feedbacks that maintain sphagnum and keep seedlings out. You're hypothesizing desiccation as a bog-maintaining feedback, driven by the height of the moss layer above the water table and exacerbated by the exposed microclimate. That seems to imply the effect of DWT depends on the microclimatic context (e.g. temperature, wind speed, humidity) so it would be expected to vary across patches of different vegetation structure. Do you have direct evidence of seedling mortality caused by desiccation? (I suspect the answer is yes, based on the plantation study, but TBH I don't remember.)

On the other side, what feedbacks maintain forest canopy and promote seedling establishment? Is sphagnum absent / less abundant / lower in stature in forest stands? If so, is that because it gets shaded out, or what? This is probably a dumb question, but can the vegetation itself influence the water table depth, e.g. through evapotranspiration?

Alternatively (no pun intended), maybe forest and bog don't represent alternative stable states. Maybe Burned Forest is not a climax but a seral stage that is just taking a loooong time to be invaded by these slow-growing, classically K-selected trees. Or maybe environmental conditions have shifted in some way that disfavors seedling recruitment, so the pre-fire forest and the current bog aren't actually occupying "the same habitat". No doubt you've considered these alternative hypotheses, and of course the last one may be impossible to rule out without long-term (pre-fire) monitoring data.

As for those marginal effects plots of DWT ... boy, hard to know what to make of that. Lots of noise, some of which may be explained by other known covariates. If I squint at the middle plots (without the linear fits) I could convince myself that they're consistent with your hypothesis of an optimal DWT:

One of the driving ideas was that young/small seedlings may desiccate, even in the wet bog patch, during the drought season because their roots are too far from the water table given the height/thickness of the Sphagnum lawns. Yet, young seedlings are also likely to die from lengthy surface inundation.

Except the unimodal pattern is, if anything, clearer in Forest, Bog Forest, and Transition than in Bog, and the "optimum" seems to occur well away from zero. (BTW, I've been wondering why Transition and Burned Forest aren't included in the models we've been discussing.) But these are snapshot measurements of DWT, right? Could a plot with a value of, say, 10 cm be fully inundated during the wet season?

A more rigorous way to check for departures from linearity would be to statistically control for other sources of variation; that is, plot the residuals from some larger model rather than the raw data as shown here. But then that gets tricky because Poisson residuals are not well-defined / well-behaved.

Not sure if any of that helps, but I did warn you.

kzaret / PIUV_seedling_abundance

specifying appropriate weakly-informative priors #1