kzaret / PIUV_seedling_abundance

Bayesian GLMMs of P. uviferum seedling abundance at a burned and a control site, respectively.
0 stars 0 forks source link

specifying appropriate weakly-informative priors #1

Open kzaret opened 1 year ago

kzaret commented 1 year ago

rmsa_glmer_zinb2i_PriorPosterior_plots

Above are histograms of posterior probability distributions overlain by curves of the corresponding prior distributions for parameters included in a model of seedling counts regressed on photosynthetically active radiation, patch and their interaction. Clearly, my thought process about specifying priors needs some tuning. I'm going to try to articulate what I think is wrong with each. Your corrections, feedback and instruction are most welcome!

b_intercept: the prior is mis-specified, but appears not to have strongly influenced the posterior? I more or less assigned priors based on what I was seeing in posts on the stan forum regarding weakly informative priors. Here's a little more thinking about my expectations for seedling counts. I'm trying to imagine what a reasonable expected mean and standard deviation would be for the pooled data (though I'm aware of trends by patch, e.g., really low counts [0s and 1s] in the bog patch, much higher and more variable counts [up to 200] in the forest patch, and intermediary counts in the bog forest patch). Thus, perhaps my expected mean is 10 seedlings, with a higher sd, so like 15. But, I need to be working on the log scale, so ln(10) = 2.3, ln(15) = 2.7, so my prior could be N (2.3, 2.7). Does this make sense? When I plot it, it's very wide, but not totally flat and it overlaps with the range of the posterior.

b_zi_intercept: I’m don’t think that I truly understand how to interpret this parameter. I know that ZINB models include a negative binomial count component (which allows for counts of zero, unlike hurdle models which are zero-truncated), but also a binomial component – the zi component -- that models just the probability of excess [false?] counts of zero. I’m confused by the sign of the posterior parameter estimate for the binomial intercept (b_zi_intercept) and how to relate it to the estimate for the count intercept (b_Intercept). Do I exponentiate the estimate for b_zi_intercept? If so, exp(-6.77) = 0.00115. Is this the median probability of excess/false zeros in the population-level seedling counts? Regarding the prior: as specified, it looks like it cuts off much of the posterior, but I’m not sure whether/how to better specify this prior.

b_log_PAR1.3.rs: this prior looks ok – it’s wide, but not flat and doesn’t cut off the tails of the posterior.

b_patchBogForest and b_patchBog: the prior on these parameters seems too informative and could/should be widened to N(0, 1)? Can I assign a different prior for each level of patch? Would I want to, knowing that there are way more counts of zero in the Bog patch? It seems the posterior reflected that even with the tighter priors.

b_log_PAR1.3.rs:patchBogForest and b_log_PAR1.3.rs:patchBog: I have the same concerns and questions about these as for the patch parameters. Should the priors be wider?

b_zi_patchBogForest and b_zi_patchBog: help!

shape: the posterior distribution of shape looks relatively normal/bell-shaped. Is that because the exponential prior, as specified, is having little effect on the posterior? If I were to specify a prior of gamma(1, 0.5), then the probability that shape is 0 increases, which would represent more overdispersion, which I want to limit. So perhaps the relatively flat prior gamma(1, 0.1) is what I want?

ebuhle commented 1 year ago

Hola @kzaret. Thanks for laying all this out. As you're discovering, specifying priors in brms requires a little more TLC than doing so in rstanarm, because the latter makes an effort to automatically adjust the scales of default priors so they roughly make sense given the response and/or predictor data. The only thing brms does is to internally center the predictors, so the prior on the intercept applies when all predictors (in this case including the dummy factor indicators, which is somewhat meaningless) are at their sample means, not at zero. You could try to emulate the rstanarm autoscale = TRUE approach manually, but that might obfuscate more than it clarifies because of your factor predictors, count response, and especially the zero-inflation term. It does help that you've standardized your continuous input variables.

My big-picture take is that yes, some of these priors are not very sensible, but it looks like the data are fairly informative so even the wackest priors don't appear to be having much influence. That said, it certainly makes sense to improve them. The general goal here is "weakly informative" -- too narrow is a bigger concern than too broad, unless you're putting undue weight on extreme values that are physically impossible or lead to poor sampler behavior.

Let's take the NB part first.

b_intercept: the prior is mis-specified, but appears not to have strongly influenced the posterior? I more or less assigned priors based on what I was seeing in posts on the stan forum regarding weakly informative priors. Here's a little more thinking about my expectations for seedling counts. I'm trying to imagine what a reasonable expected mean and standard deviation would be for the pooled data (though I'm aware of trends by patch, e.g., really low counts [0s and 1s] in the bog patch, much higher and more variable counts [up to 200] in the forest patch, and intermediary counts in the bog forest patch). Thus, perhaps my expected mean is 10 seedlings, with a higher sd, so like 15. But, I need to be working on the log scale, so ln(10) = 2.3, ln(15) = 2.7, so my prior could be N (2.3, 2.7). Does this make sense? When I plot it, it's very wide, but not totally flat and it overlaps with the range of the posterior.

Yes, this makes sense, and the result described in the last sentence is pretty much what you want. Nicely done. My only nitpick is that of course $\textrm{log}(\mathbb{E}(X)) \neq \mathbb{E}(\textrm{log}(X))$ and $\textrm{log}(\textrm{SD}(X)) \neq \textrm{SD}(\textrm{log}(X))$, but these calculations are all hand-wavy anyway. Round numbers would be fine. Also rstanarm would multiply the data-aware scale by 2.5, making it even less informative.

b_log_PAR1.3.rs: this prior looks ok – it’s wide, but not flat and doesn’t cut off the tails of the posterior.

Agreed, but FWIW rstanarm would make it even wider -- since you've already standardized log_PAR1.3.rs to unit variance, it would be $N(0,2.5)$.

b_patchBogForest and b_patchBog: the prior on these parameters seems too informative and could/should be widened to N(0, 1)? Can I assign a different prior for each level of patch? Would I want to, knowing that there are way more counts of zero in the Bog patch? It seems the posterior reflected that even with the tighter priors.

b_log_PAR1.3.rs:patchBogForest and b_log_PAR1.3.rs:patchBog: I have the same concerns and questions about these as for the patch parameters. Should the priors be wider?

Yes, scale = 0.5 is way too narrow. If you wanted to be literal-minded you could compute the sample SD for each of these columns of the design matrix and then apply something like the rstanarm defaults. Or more heuristically, for a binary dummy variable the maximum SD occurs when the mean is 0.5, and it is also equal to 0.5. Applying the rstanarm formula would give you a scale of $2.5 / 0.5 = 5$. TBH, since the continuous predictor is standardized and the other predictor is a factor (i.e. binary dummies), I would just use $N(0,5)$ for all of the regression coefficients including the aforementioned b_log_PAR1.3.rs and call it good. Unless you really have external empirical information to bear, unique priors for each slope quickly veers into fishing-expedition territory -- e.g., I think your reasoning based on the proportion of zeros in different patches is too circular -- and it also slows down sampling a bit because then the priors on $\boldsymbol{\beta}$ can't be vectorized.

OK, now the ZI part. This is conceptually a bit subtler and also the results are more perplexing.

b_zi_intercept: I’m don’t think that I truly understand how to interpret this parameter. I know that ZINB models include a negative binomial count component (which allows for counts of zero, unlike hurdle models which are zero-truncated), but also a binomial component – the zi component -- that models just the probability of excess [false?] counts of zero.

Exactly. Nice. The ZINB model is a discrete mixture of a NB and a point mass at zero (not necessarily false observations, just excessive zeros compared to what the NB would predict) and the zi parameter is the mixture probability of the spike, while the probability of the NB is 1 - zi. This zero-inflation probability is modeled on the logit link scale, so you can reason about its parameters the same way you would about a logistic regression. For example, a value of zero on the logit scale corresponds to a probability of 0.5.

I’m confused by the sign of the posterior parameter estimate for the binomial intercept (b_zi_intercept) and how to relate it to the estimate for the count intercept (b_Intercept). Do I exponentiate the estimate for b_zi_intercept? If so, exp(-6.77) = 0.00115.

Nope, following the logistic regression analogy you take the inverse logit (although for small $x$, $\textrm{logit}^{-1}(x) \approx \textrm{exp}(x)$ ):

> plogis(-6.77)
[1] 0.001146379

Is this the median probability of excess/false zeros in the population-level seedling counts? Regarding the prior: as specified, it looks like it cuts off much of the posterior, but I’m not sure whether/how to better specify this prior.

Yes, this says that when all predictors are at their sample means, the probability of excess zeros is effectively zero -- the predicted response distribution is the NB. Now, the default brms logistic(0,1) prior is a sensible choice because it implies that with predictors centered, the prior probability of excess zeros is zi ~ uniform(0,1). To illustrate the fact that $X \sim \textrm{logistic}(0,1) \iff \textrm{logit}^{-1}(X) \sim U(0,1)$ you can do something like

x <- rlogis(10000, 0, 1)
hist(x)
hist(plogis(x))

What's perplexing is that the posterior of b_zi_Intercept goes soooo low. It's unusual to see such extreme coefficient values in a logistic model because, e.g., plogis(-10) == 4.539787e-05. Apparently the data really want to make the baseline zi zero, i.e. b_zi_Intercept == -Inf.

By the same token, it rarely makes sense for a (unit-standardized) logistic regression coefficient to have absolute value greater than say 5, because when multiplied by a predictor value of 1 SD, you get

> plogis(c(-5, 5))
[1] 0.006692851 0.993307149

That said...

b_zi_patchBogForest and b_zi_patchBog: help!

Again the scale = 0.5 is way too narrow. Following my logic above you probably want something like $N(0,3)$ for all the logit zero-inflation slopes, although it sure doesn't seem like the prior is having much effect here. Once again I'm very surprised by the extreme values of the b_zi_patchBogForest and b_zi_patchBog posteriors. The former is especially weird because it's a complete non-effect -- it says there is no difference between "Bog Forest" and the reference factor level of "Forest" -- and yet the data are consistent with enormous positive and negative slopes that would take the zi probability to 1 or 0. This makes me wonder if there is something funky going on with either the data or with parameter confounding. Bears further exploration after you get the priors sorted -- looking at bivariate posterior correlations of the zi coefs would be a logical place to start -- but I'll punt on that for now in the interest of time.

shape: the posterior distribution of shape looks relatively normal/bell-shaped. Is that because the exponential prior, as specified, is having little effect on the posterior? If I were to specify a prior of gamma(1, 0.5), then the probability that shape is 0 increases, which would represent more overdispersion, which I want to limit. So perhaps the relatively flat prior gamma(1, 0.1) is what I want?

I agree. The flattest of these priors, more simply expressed as exponential(0.1), makes the most sense. I don't think you really have a strong prior belief about the NB shape parameter; in principle it could run off toward very large values if the non-zero counts don't show much overdispersion compared to a Poisson.

That begs the question, though: have you considered modeling shape as a function of covariates? Why allow the ZI probability to vary by patch, but not the overdispersion of the NB?

Finally, we haven't discussed the hierarchical SD for the plot grouping factor:

Here the normal(0, 0.5) prior that you specified in the rmsa.glmer.zinb2h model (blue) really doesn't make much sense. The default student_t(3, 0, 2.5) (red) should be fine.

ebuhle commented 1 year ago

I realized I muddied the waters on one issue that might cause confusion. The prior on the intercept is defined such that it applies when all columns of the design matrix are centered, but that internal transformation is then inverted so that the reported estimates and samples refer to the usual intercept, i.e. the expected value when all predictors are zero. So my simple prior vs. posterior plots are actually comparing apples and oranges* when it comes to the two intercepts (for the NB and ZI components) in this model. Since your continuous predictor is already pre-centered, the interpretation is that the intercepts apply when patchBogForest and patchBog are equal to zero.

So the posterior of b_zi_Intercept means that there is effectively no zero-inflation when patch == "Forest". This might help explain both the crazy large negative values for the intercept (if the response for "Forest" really doesn't show any hint of excess zeros vs. the NB) and the crazy high values of b_zi_patchBog (if there is some zero-inflation in "Bog", this coefficient has to be at least as large in magnitude as the intercept to cancel it out), and perhaps even the crazy range of b_zi_patchBogForest (whatever the level of zero-inflation in "Bog Forest", this coefficient has to compensate for variation in the intercept).

More simply put, I'm just restating the well-known fact that when predictors have nonzero means, the regression intercept is correlated with the slopes via the likelihood -- which of course is why brms and rstanarm internally center the design matrix for more efficient sampling. Again, looking at bivariate posterior scatterplots of these three parameters would be clarifying. That still doesn't explain why there's such a huge range of uncertainty consistent with the data, though.

*Sorry, I hate this cliche. Oranges are better, obviously.

kzaret commented 1 year ago

Thank you for the comprehensive response, @ebuhle!

I have two immediate sets of questions (and then I plan to fit the model using the priors you suggested, and I'll take a look at the bivariate posterior plots of the zi coefficients).

1) Aren't the student_t(3, 0, 2.5) and N(0, 5) priors for the sd_plot__intercept and NB coefficients, respectively, "flat"? But that's ok as long as the probability for the majority of parameter values are greater than 0?

2) > That begs the question, though: have you considered modeling shape as a function of covariates? Why allow the ZI probability to vary by patch, but not the overdispersion of the NB?

This hadn't occurred to me, but I'm intrigued. Hypothetically, I could model shape as a function of patch in a purely NB model with no ZI component, right? I settled on ZINB regression because when I compared ppc plots between Poisson, NB and ZINB versions of the model, only the simulated distribution from the ZINB included reasonable proportions of zeros per patch. But, if the overdispersion in the data are mainly due to the number of zeros, then maybe I could work with NB models, which I could in rstanarm instead of brms (though I'm not sure that really makes anything simpler/easier at this point). What are your thoughts?

kzaret commented 1 year ago

Aren't the student_t(3, 0, 2.5) and N(0, 5) priors for the sd_plot__intercept and NB coefficients, respectively, "flat"? But that's ok as long as the probability for the majority of parameter values are greater than 0? Never mind. Of course they aren't flat and I can see that when I adjust my x axes. . . .

ebuhle commented 1 year ago

Aren't the student_t(3, 0, 2.5) and N(0, 5) priors for the sd_plot__intercept and NB coefficients, respectively, "flat"? But that's ok as long as the probability for the majority of parameter values are greater than 0? Never mind. Of course they aren't flat and I can see that when I adjust my x axes. . . .

Ding ding ding! :rofl: Gelman-world's emphasis on "weakly informative" priors as the default is really an antidote to the bad habits encouraged by WinBUGS / JAGS during the decade when they were the dominant platforms for applied Bayesian modeling. You would often see mindless things like $N(0,100)$ or $U(0,10^6)$ priors (I'm guilty of both) or the ubiquitous $IG(0.01, 0.01)$ for hierarchical and observation-level variances, which Gelman criticized in an influential paper (note that his advice has evolved since). Point being, $\beta \sim N(0,5)$ is an entirely reasonable prior in your case, and if you hadn't brought it up I would have imagined you were using something at least that diffuse.

Edited to add: the computational and inferential benefits of weak regularization through the prior mainly apply when the data aren't informative enough to rule out extremely long tails and outlier posterior draws, that sort of thing. In that case, more careful attention to avoid "overly diffuse" priors makes sense. I don't see any indication of that here, although the zi regression situation is TBD.

This hadn't occurred to me, but I'm intrigued. Hypothetically, I could model shape as a function of patch in a purely NB model with no ZI component, right? I settled on ZINB regression because when I compared ppc plots between Poisson, NB and ZINB versions of the model, only the simulated distribution from the ZINB included reasonable proportions of zeros per patch. But, if the overdispersion in the data are mainly due to the number of zeros, then maybe I could work with NB models, which I could in rstanarm instead of brms (though I'm not sure that really makes anything simpler/easier at this point). What are your thoughts?

My thoughts are yes to all of this. If the driving pattern in the data is different degrees of overdispersion and leptokurtosis in different patches, I would probably try to model it with the NB scale before resorting to a ZINB.

kzaret commented 1 year ago

I fit a negative binomial model using the same priors as discussed above for the ZINB (mods rmsa.glmer.nb2k and rmsa.glmer.zinb2k, respectively) and then fit a NB while modeling shape using patch (rmsa.glmer.nb2k.2). I did not assign specific priors for the shape component, so b_shape_Intercept, b_shape_patchBogForest, and b_shape_patchBog were all modeled with N(0, 5) priors and the Intercept_shape parameter was assigned brms' default (?) student-t (3, 0, 2.5).

Here are prior. vs. posterior plots for the NB w/ shape modeled as a function of patch: rmsa_glmer_nb2k2_PriorvsPosterior

Since overdispersion should be greater than or equal to zero, the prior distribution for this parameter should be gamma or exponential (as discussed/specified for the ZINB), right? But what is b_shape_Intercept vs. Intercept_shape?

ebuhle commented 1 year ago

Since overdispersion should be greater than or equal to zero, the prior distribution for this parameter should be gamma or exponential (as discussed/specified for the ZINB), right?

No, once you specify a formula for shape it is modeled on the link (default log) scale. All the priors you've got there look OK.

But what is b_shape_Intercept vs. Intercept_shape?

That, I do not understand. What are summary(rmsa.glmer.nb2k.2) and prior_summary(rmsa.glmer.nb2k.2)?

kzaret commented 1 year ago
# summary (rmsa.glmer.nb2k.2)

Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ log_PAR1.3.rs + patch + log_PAR1.3.rs:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.41      0.16     0.08     0.72 1.00     2203     1883

Population-Level Effects: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                        3.47      0.28     2.92     4.03 1.00
shape_Intercept                  0.71      0.27     0.17     1.23 1.00
log_PAR1.3.rs                    0.20      0.20    -0.20     0.59 1.00
patchBogForest                  -0.72      0.39    -1.48     0.07 1.00
patchBog                        -3.12      0.50    -4.06    -2.10 1.00
log_PAR1.3.rs:patchBogForest    -0.58      0.41    -1.40     0.19 1.00
log_PAR1.3.rs:patchBog          -0.80      0.50    -1.85     0.14 1.00
shape_patchBogForest             0.43      0.46    -0.45     1.36 1.00
shape_patchBog                  -1.57      0.57    -2.66    -0.41 1.00
                             Bulk_ESS Tail_ESS
Intercept                        4632     4726
shape_Intercept                 10988     7835
log_PAR1.3.rs                    4861     5965
patchBogForest                   4903     5977
patchBog                         5620     5531
log_PAR1.3.rs:patchBogForest     7627     6897
log_PAR1.3.rs:patchBog           6966     5778
shape_patchBogForest             6565     7765
shape_patchBog                   7026     6397

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
#prior_summary(rmsa.glmer.nb2k.2)

 prior     class                         coef group resp  dpar nlpar
         normal(0, 5)         b                                                    
         normal(0, 5)         b                log_PAR1.3.rs                       
         normal(0, 5)         b       log_PAR1.3.rs:patchBog                       
         normal(0, 5)         b log_PAR1.3.rs:patchBogForest                       
         normal(0, 5)         b                     patchBog                       
         normal(0, 5)         b               patchBogForest                       
         normal(0, 5)         b                                         shape      
         normal(0, 5)         b                     patchBog            shape      
         normal(0, 5)         b               patchBogForest            shape      
         normal(2, 3) Intercept                                                    
 student_t(3, 0, 2.5) Intercept                                         shape      
 student_t(3, 0, 2.5)        sd                                                    
 student_t(3, 0, 2.5)        sd                               plot                 
 student_t(3, 0, 2.5)        sd                    Intercept  plot                 
 bound       source
               user
       (vectorized)
       (vectorized)
       (vectorized)
       (vectorized)
       (vectorized)
       (vectorized)
       (vectorized)
       (vectorized)
               user
            default
            default
       (vectorized)
       (vectorized)

Edit: Some observations about the nb2k.2 model vs. the nb2k and zinb2k models:

1) The 95% CI for b_log_par1.3.rs:patchBog consistently excluded zero in the ZINB models and the NB model in which shape wasn't predicted [is it appropriate to phrase things as such?], but that's not the case here.

2) 2) While the proportion of zeros by patch look good for the NB shape model, the max counts and sds are high (as compared to the ZINB and initial NB models):

image

image

image

ebuhle commented 1 year ago

OK, so looking at the summary(), there is only a shape_Intercept. IDK what the other thing you plotted was. FWIW, these results seem promising insofar as the counts in "Bog" are "significantly" more overdispersed than in the baseline "Forest" while "Bog Forest" is actually somewhat more Poisson-like, albeit not with 95% probability.

ebuhle commented 1 year ago
  1. The 95% CI for b_log_par1.3.rs:patchBog consistently excluded zero in the ZINB models and the NB model in which shape wasn't predicted [is it appropriate to phrase things as such?], but that's not the case here.

That's an appropriate phrasing, and this tracks. Presumably the patch-specific shape is soaking up some of the variation that would otherwise be attributed to patch-specific irradiance effects.

Edited to add: also, the effect of PAR in "Forest", b_log_par1.3.rs, is weakly positive in this model whereas it was weakly negative in the ZINB you originally posted.

2. While the proportion of zeros by patch look good for the NB shape model, the max counts and sds are high (as compared to the ZINB and initial NB models):

These look OK to me, and not so different from the other two models except in "Bog". It's a bit hard to tell, but clearly the NB shape model allows a more overdispersed, leptokurtic, long-tailed distribution in that patch vs. the others. Hard to say if this is really a "problem". This might be one case where it's worth computing a Bayesian P-value, i.e. the numerical summary of these plots. To my eye it looks like the observed max and SD might be no more unexpected, relative to the PPD, in the NB shape model than in the other two?

kzaret commented 1 year ago

To my eye it looks like the observed max and SD might be no more unexpected, relative to the PPD, in the NB shape model than in the other two?

Ah. Yeah -- the maximum values may be larger in the NB shape model, but they occur with very low frequency. Almost all of the maximum counts are under about 350/375 for the NB shape model and they're under about 300 for the ZINB model.

I shall look into the Bayesian P-value. What about loo_compare?

ebuhle commented 1 year ago

Yes, you could certainly loo_compare() these models. I wouldn't sweat the Bayesian P-value stuff unless you're really concerned. It would probably be just as enlightening to restrict the x-axis in the PPC plots so it's easier to see what's going on.

kzaret commented 1 year ago

I used loo to compare the following models: ZINB intercept-only, ZINB with patch and log(PAR 1.3) as covariates, NB intercept-only, NB intercept-only with shape modeled as a function of patch, and NB with patch and log(PAR 1.3) as covariates and shape modeled as a function of patch. Drum roll. . . .There's apparently no substantial difference in the ability of any of these to predict previously unseen data:

loo_compare(loo_rmsa.zinb0k, loo_rmsa.zinb2k,loo_rmsa.nb0k, loo_rmsa.nb0k.2, loo_rmsa.nb2k.2)
                  elpd_diff se_diff
rmsa.glmer.nb2k.2  0.0       0.0   
rmsa.glmer.zinb2k -0.6       2.2   
rmsa.glmer.zinb0k -3.5       5.4   
rmsa.glmer.nb0k   -7.0       5.4   
rmsa.glmer.nb0k.2 -7.7       4.5 

To move forward with other simple models (i.e., those with only one [or two, if I think there's an interaction with patch] of the other potential covariates that I think may be associated with P. uviferum seedling abundance), I'm planning to work with NB models that include shape as a function of patch. This seems to account for the overdispersion/extra zeros in the Bog patch, and it also seems like the more parsimonious choice of likelihood rather than the ZINB (especially given the weird bivariate plots). These are weird, right?

image

Does 'model expansion' usually refer to the idea of adding covariates to a model sequentially (e.g., keeping patch, log(PAR 1.3) and their interaction and then adding, for example, percent cover of Sphagnum) or does one usually investigate the response variable as a function of one covariate at a time and then build models with increasingly more terms as "significant" covariates are identified? (By "significant", I mean covariates whose median parameter estimates differ from zero with some threshold probability [maybe 90% in my case]).

Thank you, wise stats-master. I hope to be (or at least become) a worthy mentee.

ebuhle commented 1 year ago

Drum roll. . . .There's apparently no substantial difference in the ability of any of these to predict previously unseen data:

Haha! I guess that's not too shocking -- these are count data, they're noisy, the estimated effects aren't enormous, and the models probably leave a lot of variation unexplained. At least the trend is in the direction of favoring NB + covariates + patch-specific shape over NB + covariates + global shape.

To move forward with other simple models (i.e., those with only one [or two, if I think there's an interaction with patch] of the other potential covariates that I think may be associated with P. uviferum seedling abundance), I'm planning to work with NB models that include shape as a function of patch. This seems to account for the overdispersion/extra zeros in the Bog patch, and it also seems like the more parsimonious choice of likelihood rather than the ZINB (especially given the weird bivariate plots).

I agree with all of this.

These are weird, right?

Not really so weird; they're about what I would have expected given our earlier discussion of this model and the inherent correlation b/w intercept and slope(s) when predictors are noncentered. I certainly wouldn't call this evidence of severe nonidentifiability or anything (recalling that the primitive parameter for the intercept is orthogonal to the slopes by construction). The scraggly left tail of b_zi_patchBog is pretty weird, and I'm still none the wiser as to where it's coming from. One more reason to eschew this model.

Does 'model expansion' usually refer to the idea of adding covariates to a model sequentially (e.g., keeping patch, log(PAR 1.3) and their interaction and then adding, for example, percent cover of Sphagnum) or does one usually investigate the response variable as a function of one covariate at a time and then build models with increasingly more terms as "significant" covariates are identified? (By "significant", I mean covariates whose median parameter estimates differ from zero with some threshold probability [maybe 90% in my case]).

Eh, none of the above? That phrase / concept can refer to a variety of things, but I usually think of it in terms of incrementally adding structural features, not so much covariates. This could mean something like zero-inflation, group-varying coefficients like your patch-dependent shape, less-constrained error structures, etc. Usually (and specifically in Gelman-world) it's preceded by "continuous", and continuous model expansion is contrasted with the discrete model selection approach exemplified by, say, PSIS-LOO. I remember Ray Hilborn suggesting this approach too. Rather than pick a winner, you build a global model that encompasses the more restricted models as special cases and make inferences based on the marginal or joint posteriors of parameters or quantities of scientific interest. There are more advanced ways of formalizing this notion, like projection predictive model selection ... but they are advanced.

In regression models I suppose it can make sense to start with an intercept-only model before adding covariates, and this is my typical workflow -- but then I'm typically interested in the intercept-only model as a reference point in the candidate set, because I'm not as down on model selection as Gelman is. (One problem with the kitchen-sink approach is that the bias-variance tradeoff often results in so much posterior uncertainty that you can't make any confident inferences at all.) I definitely do not advocate adding covariates one by one as a general rule, and especially not in the "stepwise regression" manner you describe. There's a reason stepwise variable selection has been left in the dustbin of frequentist history. Or rather many reasons, path-dependency being high on the list. I still think you should posit the candidate models ahead of time based on what substantive hypotheses make sense, e.g. maybe irradiance effects are global or maybe they vary by patch, etc.

Thank you, wise stats-master. I hope to be (or at least become) a worthy mentee.

Learning you are, young Skywalker.

kzaret commented 1 year ago

Thanks -- all of the above offers helpful perspective.

I remember Ray Hilborn suggesting this approach too. Rather than pick a winner, you build a global model that encompasses the more restricted models as special cases and make inferences based on the marginal or joint posteriors of parameters or quantities of scientific interest.

In last year's version of this analysis, I fit four global models and a bunch of special case, simpler models. I wanted a global model that simultaneously captured covariates that represented light conditions, water table level, abundance of seed trees and the structure of the understory community. But, I had a number of variables that represented different aspects of the understory community and required different degrees of time investment in the data collection process. Thus, my four global models varied by a covariate or two. Ultimately, loo_compare() indicated no differences between any of the models, so I stacked and and weighted their contributions to a joint posterior using loo::stacking_weights and presented plots (one per control and burned site) of parameter interval estimates as my primary result.

This year, with Andres' encouragement, I spent more time carefully laying out my thoughts about the expected relationships between seedling abundance and individual covariates. Those will drive the simpler models that I run. Overall, though, I think that some combination of microsite factors pertaining to P. uviferum physiological tolerance/limits, seedbed availability, growth substrate availability, and competitive or facilitative interactions with other species (or other P. uviferum individuals) will be driving seedling abundance. Would you advocate for model stacking in the event that a simpler candidate model is substantially different from an intercept model via loo but the global model is not?

kzaret commented 1 year ago

FWIW, these results seem promising insofar as the counts in "Bog" are "significantly" more overdispersed than in the baseline "Forest" while "Bog Forest" is actually somewhat more Poisson-like, albeit not with 95% probability.

Sorry: Why does the non-zero negative marginal effect of Bog mean that counts in that patch are significantly more overdispersed? (I'm working my way through model summaries. . . .)

ebuhle commented 1 year ago

Sorry: Why does the non-zero negative marginal effect of Bog mean that counts in that patch are significantly more overdispersed? (I'm working my way through model summaries. . . .)

Thanks, I'll take the simple question first since I've been too scattered to formulate my thoughts on model averaging vs. model selection.

It's because the shape parameter (conventionally denoted $\phi$) in this parameterization of the NB is inversely related to the variance, such that the distribution approaches the Poisson as $\phi \rightarrow \infty$. Likewise shape_patchBogForest is positive, albeit with < 95% probability, so counts in "BogForest" are slightly less overdispersed and more Poisson-like.

kzaret commented 1 year ago

Thanks!

I think I'll have more relevant/specific questions about model averaging vs. model selection once I've finished wrapping my head around the the simple candidate model results. So, feel free to punt the one above.

ebuhle commented 1 year ago

FYI, the (mean, shape) parameterization of the NB used to be colloquially known as the "ecological parameterization" before it went mainstream. There is also a 3-parameter version that adds a first-order term to the quadratic mean-variance relationship, which totally saved my Guam tree seed dispersal analysis.

kzaret commented 1 year ago

Through pp checks (draws = 500) of some of my simpler candidate models and all of my global models I'm seeing maximum counts and sds for the Bog patch (and also the Forest patch) that are "too high" to "unreasonably high" (e.g. over 60,000). I'm wondering if you have thoughts on why this is and whether/how to fix it, including whether I should use the 3-parameter version of the NB. (I have not yet read beyond the abstract of the paper that you referenced in your last comment, but I will do so.)

Below are model summaries, ppc plots and prior vs. posterior plots for three candidate models and one global model. I couldn't figure out how to adjust the x axes for individual groups in the ppc plots, so in some instances (e.g., model nb3k) I "zoomed in" and created a series of plots to better visualize the yrep data. Let me know if additional plots would be helpful. Looking at the prior vs. posterior plots: do you think that the priors of DWT in model nb3k and/or the priors of patch or Other Shrubs in model nb13k.b are too informative?

Here's the first candidate model (nb3k):

Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ DWT + patch + DWT:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.33      0.16     0.03     0.67 1.00     2288     2861

Population-Level Effects: 
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                3.25      1.16     0.96     5.51 1.00     1606     3202
shape_Intercept          0.75      0.28     0.20     1.28 1.00     7906     7479
DWT.L                    0.05      2.44    -4.70     4.83 1.00     1574     3247
DWT.Q                   -0.39      1.42    -3.16     2.41 1.00     1624     3393
patchBogForest          -0.61      1.17    -2.90     1.71 1.00     1619     3127
patchBog                -4.50      1.82    -8.33    -1.09 1.00     2193     4009
DWT.L:patchBogForest    -0.46      2.45    -5.27     4.28 1.00     1568     3300
DWT.Q:patchBogForest     0.28      1.44    -2.50     3.04 1.00     1641     3605
DWT.L:patchBog          -2.65      3.36    -9.55     3.69 1.00     3165     5580
DWT.Q:patchBog          -1.58      2.22    -6.25     2.51 1.00     2292     4213
shape_patchBogForest     0.31      0.43    -0.55     1.18 1.00     8802     7022
shape_patchBog          -1.69      0.56    -2.76    -0.55 1.00     6755     5913

image

image

image

The second candidate model (nb5k):

 Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ Plot_PIUVcnt.rs + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.57      0.19     0.26     1.00 1.00     2154     3343

Population-Level Effects: 
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                2.64      0.20     2.20     2.97 1.00     2965     3133
shape_Intercept          0.72      0.27     0.19     1.23 1.00    14379     8372
Plot_PIUVcnt.rs          0.37      0.17     0.08     0.75 1.00     3848     3280
shape_patchBogForest     0.46      0.46    -0.42     1.36 1.00     9827     7289
shape_patchBog          -2.69      0.45    -3.57    -1.78 1.00     7202     5378

image

image

The third candidate model (nb13k.b):

Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ OtherShrubs.rs + patch + OtherShrubs.rs:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.36      0.15     0.05     0.67 1.00     1805     1417

Population-Level Effects: 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                         3.55      0.22     3.12     3.98 1.00     7696
shape_Intercept                   0.82      0.27     0.28     1.35 1.00    15260
OtherShrubs.rs                   -0.33      0.13    -0.58    -0.06 1.00     9648
patchBogForest                   -1.13      0.29    -1.70    -0.57 1.00     7534
patchBog                         -3.10      2.15    -7.21     1.29 1.00     7934
OtherShrubs.rs:patchBogForest     0.00      0.26    -0.51     0.53 1.00    10325
OtherShrubs.rs:patchBog           0.77      3.40    -5.73     7.78 1.00     8066
shape_patchBogForest              0.33      0.45    -0.55     1.23 1.00     8067
shape_patchBog                   -1.83      0.54    -2.86    -0.72 1.00    10507
                              Tail_ESS
Intercept                         7254
shape_Intercept                   8053
OtherShrubs.rs                    7511
patchBogForest                    6968
patchBog                          6923
OtherShrubs.rs:patchBogForest     7786
OtherShrubs.rs:patchBog           6843
shape_patchBogForest              7511
shape_patchBog                    7713

image

image

And finally, the global model (nb18k.d):

Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ DWT + log_PAR1.3.rs + Plot_PIUVcnt.rs + patch + DWT:patch + log_PAR1.3.rs:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.33      0.16     0.03     0.67 1.00     2541     3319

Population-Level Effects: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                        3.67      1.19     1.34     6.01 1.00     1926     3118
shape_Intercept                  0.75      0.28     0.21     1.29 1.00     6793     7421
DWT.L                            0.18      2.41    -4.55     4.93 1.00     1907     2705
DWT.Q                           -0.55      1.41    -3.31     2.21 1.00     1938     2927
log_PAR1.3.rs                    0.26      0.19    -0.12     0.64 1.00     6986     6735
Plot_PIUVcnt.rs                 -0.18      0.23    -0.62     0.29 1.00     5992     5519
patchBogForest                  -0.85      1.25    -3.30     1.60 1.00     1975     3840
patchBog                        -4.87      1.84    -8.65    -1.34 1.00     2556     4400
DWT.L:patchBogForest            -0.61      2.42    -5.35     4.12 1.00     1892     2784
DWT.Q:patchBogForest             0.46      1.42    -2.29     3.23 1.00     1924     2961
DWT.L:patchBog                  -2.64      3.32    -9.48     3.67 1.00     3833     5915
DWT.Q:patchBog                  -1.48      2.17    -5.97     2.56 1.00     2599     4257
log_PAR1.3.rs:patchBogForest    -0.73      0.40    -1.54     0.06 1.00     8537     7902
log_PAR1.3.rs:patchBog          -0.89      0.51    -1.95     0.04 1.00     7133     5636
shape_patchBogForest             0.35      0.44    -0.50     1.23 1.00     7562     7454
shape_patchBog                  -1.60      0.59    -2.72    -0.40 1.00     7337     7587

image

image

ebuhle commented 1 year ago

Through pp checks (draws = 500) of some of my simpler candidate models and all of my global models I'm seeing maximum counts and sds for the Bog patch (and also the Forest patch) that are "too high" to "unreasonably high" (e.g. over 60,000). I'm wondering if you have thoughts on why this is and whether/how to fix it

These patterns are all indicative of fitted distributions that are (even) more leptokurtic / long-tailed than the observations. The $y_{\textrm{rep}}$ themselves and their maxima have extremely long tails that admit some minuscule probability of unreasonably high values. For the most part Forest looks ok to me; it's Bog that stands out.

Whether this is a serious problem depends to some extent on how you plan to use these models. Are you really going to be making numerical predictions? If so, is the maximum really of interest? It's inherently a very noisy statistic. More likely you'd be interested in the 90% or 95% posterior prediction interval, and I bet if you did a PP check on those they would look more reasonable because there are so few extreme outliers.

Similarly, the SD of count data is a bit tricky to interpret since it's related to the mean. The CV would be a more intuitive statistic for PP checks: a value of 1 corresponds to Poisson, higher values indicate more overdispersion.

That said, it is reasonable to try to improve these models' fit to the right tail of the data. Two possibilities come to mind:

Looking at the prior vs. posterior plots: do you think that the priors of DWT in model nb3k and/or the priors of patch or Other Shrubs in model nb13k.b are too informative?

Yes, it does look like priors on several of these coefficients across multiple models are having an unintended influence on the posterior. You could probably give all of the slopes N(0,10) priors. I don't know how much that's likely to change these patterns, though.

Since the issue seems to be more about the overdispersion than the mean function, you could also try making the model for $\phi$ more flexible. Seems like an obvious candidate would be adding a (1 | plot) grouping factor -- that is, unless there's some reason you expect plot-level heterogeneity in the mean but not in the overdispersion?

whether I should use the 3-parameter version of the NB. (I have not yet read beyond the abstract of the paper that you referenced in your last comment, but I will do so.)

I mentioned that mainly as an FYI, but this could be a situation where it might help. It would be nontrivial to implement, though; you'd have to define a new family for brms. I've seen people post about this and I don't think it's difficult in principle, but I've never done it. I'd bookmark this until you exhaust other options (including the possibility that these PPC results are NBD).

kzaret commented 1 year ago

Thanks for all of the above. Since I'm feeling a lot of pressure to wrap up this analysis and I won't be using the models to make numerical predictions -- my goal is to identify which variables are most strongly associated with seedling counts -- do you think that I need to/should re-fit the models with adjusted priors and/or the grouping factor as another covariate of shape in order to feel ok about publishing the results? (There are of course some more decisions to make, like which models to stack to generate a joint posterior and final parameter estimates. . . . )

ebuhle commented 1 year ago

It shouldn't take too much time to tweak the priors and add plot-varying shape intercepts, right? Those seem like reasonable things to try. Maybe add them to the global model first and see if they help.

As for stacking etc., this ties back into what I wanted to say about model selection vs. model averaging, which is that the best approach depends on what you intend to do with the final model(s). More on that later.

kzaret commented 1 year ago

I adjusted the priors and added the plot-varying shape intercepts to the global model, one at a time and then together. The results were more wacky than the original version. The DWT.L priors still appeared somewhat influential and this was also the case when I experimented with N(0, 20). The max counts are increasing as the priors widen since this is allowing for larger estimates, right?

ebuhle commented 1 year ago

Eh, it's not entirely clear why the max counts are increasing, especially since the coefficients that are strongly constrained by their priors in the results above include several interaction terms. It's still puzzling why so many of those interactions are so large in magnitude, e.g. values of -5 or -10 on the log link scale. Probably the effects you're seeing on the right tail have less to do with the mean function than with the variance, since it scales quadratically with the expectation. Regardless, it sounds like the low-hanging fruit made things worse not better.

Did you try PP checks on the 95th percentile or the CV to see if they paint a less wacky picture?

kzaret commented 1 year ago

I haven't been able to figure out how to to do pp checks of specific stats on just the 95th percentile. Could you offer some "code for dummies"?

ebuhle commented 1 year ago

I think it's just the stat argument to ppc_stat(). The only "trick" is that "[i]n all cases the function(s) should take a vector input and return a scalar statistic", i.e. it can't take additional arguments. So you can either define the functions inline, or if you want more informative plot labels to help keep track of all the various output, define them ahead of time:

q95 <- function(x) quantile(x, 0.95)
cv <- function(x) sd(x)/mean(x)

ppc_stat(y, yrep, stat = "q95")
ppc_stat(y, yrep, stat = "cv")
kzaret commented 1 year ago

Viel Danke!

Here are the ppc cv plots for mod 18k.d (priors of N[0, 5] on the slopes and shape modeled as a function of patch) and mod 18m.d (priors of N[0, 10] on the slopes and shape modeled as a function of patch and the grouping factor). As you pointed out, the fitted distributions are quite a bit more overdispersed than the data (and there's that weird spike at 6 for mod 18m.d). This means that I could be doing a better job of modeling the shape parameter, right? So that would either be in terms of predictors or adjusting the priors?

image

And here are the ppc q95 results for the above models. For mod 18k.d, there's one draw out at 12,200 in the Bog patch, but the majority are under 500. The results for mod 18m.d are wonky for Bog. Do you have any ideas about what's going on there?

image

image

kzaret commented 1 year ago

Here's what I get if I model shape as a function of just the grouping factor (1|plot) and set the slope priors at N(0, 5) [for consistency with the bulk of my candidate models to date):

image

(Hell yeah, I didn't need to use my phone's hotspot to paste the above figs!) The counts for Bog look great, but the overdispersion for Forest and Bog Forest are still high.

Edit: The proportion of zeros for Bog are low, looking at draws from the entire posterior. Not sure if I should be concerned. . . .

image

ebuhle commented 1 year ago

Viel Danke!

Bitte schön. (I was going to point out that it's "Vielen Dank", but playing Grammar Nazi in German is a bit too on-the-nose.)

Here are the ppc cv plots for mod 18k.d (priors of N[0, 5] on the slopes and shape modeled as a function of patch) and mod 18m.d (priors of N[0, 10] on the slopes and shape modeled as a function of patch and the grouping factor). As you pointed out, the fitted distributions are quite a bit more overdispersed than the data (and there's that weird spike at 6 for mod 18m.d). This means that I could be doing a better job of modeling the shape parameter, right? So that would either be in terms of predictors or adjusting the priors?

Technically yes to the last two questions, but for practical purposes the CV results for rmsa.glmer.nb18k.d don't look that bad at all. Yes, the model admits a right-tail possibility of greater overdispersion than the data, but the sample CV is very close to the mode of the PPD in all three patches. Ideally you'd like the sample statistic to match the median of the PPD; the Bayesian P-value (i.e., the former as a quantile of the latter) formalizes this notion. As long as that quantile isn't too extreme (standard "significance" thresholds apply here) the model is doing OK.

What's more illuminating about these results is that the observed marginal distributions in Forest and Bog Forest are actually slightly underdispersed relative to the Poisson (CV < 1) and only Bog is overdispersed. I suspect this explains a lot of the difficulty you're having with these models: the distribution of seedling counts is just really different across these sites, and it's tough for a simple parametric model to accommodate all three. Did you ever try a zero-inflated Poisson? It's conceivable that the NB is not a great fit for these data because it ties overdispersion to the thickness of the right tail. If most of the "overdispersion" is in fact coming from excess zeros as opposed to excessive right-skew in general, you might have more success with a ZIP. (Or not.)

And here are the ppc q95 results for the above models. For mod 18k.d, there's one draw out at 12,200 in the Bog patch, but the majority are under 500. The results for mod 18m.d are wonky for Bog. Do you have any ideas about what's going on there?

Overall, these results just show I was too optimistic in thinking there might only be 5% of observations out in that extreme tail of the PPD. The lower the quantile you pick, the more of the tail you ignore. I don't know what is going on with the axis scale for Bog under glmer.rmsa.nb19m.d, though. Is there actually a bar for the PPD in that plot? Like underneath the data? If you wanted to dig into this, the most direct thing would be to look at the PPD samples themselves, e.g.

summary(yrep[,sp.rmb$patch == "Bog"])

or similar manipulations.

Here's what I get if I model shape as a function of just the grouping factor (1|plot) and set the slope priors at N(0, 5) [for consistency with the bulk of my candidate models to date):

These look about the same as rmsa.glmer.nb18k.d, maybe a little better in Bog (though it's hard to tell without computing the Bayesian P-value). Which suggests that plot-varying shape term isn't helping much.

Edit: The proportion of zeros for Bog are low, looking at draws from the entire posterior. Not sure if I should be concerned. . . .

Nah, that particular plot looks perfectly acceptable on its own -- again, think (or compute) what quantile of the PPD the sample proportion is; in this case it's not extreme at all. However, this might be another nudge in favor of a ZIP, especially since the model is overestimating the zero class in the other two patches. Unlike the NB, the ZIP would account for those site-dependent differences directly.

(Hell yeah, I didn't need to use my phone's hotspot to paste the above figs!)

Huh. I wonder what changed for you. I just tried to embed an image from this repo and got the usual error. :frowning_face:

kzaret commented 1 year ago

(or compute) what quantile of the PPD the sample proportion is;

Sorry, but how do I do this ?

Based on the zip results (below), I'm leaning toward moving forward with the models as currently specified (e.g, like nb18k.d).

 Family: zero_inflated_poisson 
  Links: mu = log; zi = logit 
Formula: PIUV_less5cmDBH_total ~ DWT + log_PAR1.3.rs + Plot_PIUVcnt.rs + patch + DWT:patch + log_PAR1.3.rs:patch + (1 | plot) 
         zi ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.53      0.11     0.35     0.79 1.00     3192     5606

Population-Level Effects: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                        3.37      1.16     1.10     5.65 1.00     3302     5000
zi_Intercept                    -6.79      3.19   -14.98    -2.86 1.00     5061     2947
DWT.L                            0.11      2.39    -4.60     4.75 1.00     3278     4699
DWT.Q                           -0.29      1.38    -2.98     2.44 1.00     3281     4795
log_PAR1.3.rs                    0.02      0.06    -0.10     0.13 1.00    11119     7194
Plot_PIUVcnt.rs                 -0.15      0.25    -0.64     0.33 1.00     4865     5741
patchBogForest                  -0.64      1.21    -3.00     1.74 1.00     3378     5158
patchBog                        -3.33      2.30    -7.51     1.59 1.00     3828     5079
DWT.L:patchBogForest            -0.32      2.40    -4.94     4.41 1.00     3288     4783
DWT.Q:patchBogForest             0.30      1.38    -2.43     2.97 1.00     3294     4818
DWT.L:patchBog                  -1.51      4.15    -9.23     7.37 1.00     5699     5686
DWT.Q:patchBog                  -0.79      2.77    -5.92     5.20 1.00     3838     4883
log_PAR1.3.rs:patchBogForest    -0.46      0.17    -0.80    -0.12 1.00    12208     7734
log_PAR1.3.rs:patchBog          -0.84      0.32    -1.47    -0.21 1.00     7703     7590
zi_patchBogForest                3.08      3.39    -1.82    11.53 1.00     4466     3017
zi_patchBog                      6.63      3.23     2.50    14.78 1.00     4906     2843

image

Copying and pasting via Century Link failed this time. . . .

ebuhle commented 1 year ago

Based on the zip results (below), I'm leaning toward moving forward with the models as currently specified (e.g, like nb18k.d).

Ugh. It's like the data are somewhere in between a ZIP and an NB. (Also, those zi_patchBogForest and zi_patchBog posteriors are wild!) I guess you could try plot-varying ZI probability, but if that doesn't help then I agree with your leaning.

Sorry, but how do I do this ?

There's a brief explanation of the concept here and an example calculation here. Note that the key line in Jonah's example is

# calculate proportion of stat(yrep) > stat(y)
p <- mean(apply(yrep, 1, median) > median(y))

which just computes the proportion (i.e., the mean of a logical indicator) of posterior draws for which the predicted statistic is greater than the sample statistic. You'd have to break down the columns of yrep by patch if you wanted to mirror the results in your PPC plots. That said, I don't consider this essential -- it's just an informal way of quantifying the information shown in the plots, which for the most part are more useful than the number.

kzaret commented 1 year ago

Thank you!

kzaret commented 1 year ago

I computed "p-values" (by patch) for a bunch of the test statistics and compared them across a subset of models (e.g., the nb and zip versions of mod 18k and a couple of the simpler candidate models) and feel confident in moving fowrard with the nb versions in which shape is function of patch.

My next potential steps are: 1) Determine which models have elpdsthat are substantially lower than the intercept-only model and exclude them from further analysis. I will also exclude a few of the candidate models that had wacky (i.e., super constrained?) values in the Bog patch (like in mod zip18k). 2) Use elpd_loo to derive stacking weights for the models that aren't excluded (e.g., following https://avehtari.github.io/modelselection/rats_kcv.html#6_Alternative_models_for_the_prediction_given_initial_weight). One sticking point here: I'm not sure it makes sense to include more than one global model (I fit four). Predictors for global models were chosen from candidate models with parameter effects that excluded zero with at least 80% probability, had Spearman rank correlations of less than 0.60 with other covariates, and did not pertain to models with specification issues as revealed by graphical posterior predictive checks. 3) Extract a weighted number of draws from the posterior of each model. 4) Combine the above to derive final parameter estimates.

What do you think?

ebuhle commented 1 year ago

What do you mean by four global models? Isn't the global model by definition a superset of all the simpler ones?

kzaret commented 1 year ago

Right. Rather than global models, the four are "slightly more complex candidate models" with predictors chosen based on the results of the simplest candidate models. The simplest candidate models included one predictor or three predictors if my hypothesized relationship between seedling counts and the predictor was predicated on patch (and thus the model included patch and the interaction term).

Based on the literature and my observations in the field, I have reason to believe that P. uviferum seedling abundance is associated with light and water levels, seed availability, competitive interactions with Sphagnum and other mosses, facilitative effects of understory trees and shrubs, and potentially competitve or facilitative interactions with other vegetation life forms (e.g., ferns). A bunch of the data that I collected in the field overlaps in terms of the above categories (e.g., "pool" and "distance to the water table, DWT" both get at water level). The model of seedling counts as a function of pool yielded parameter estimates of pool that included 0 in the 80% credible interval (which I was thinking of as my cut-off for "significance"). That model was not substantially better able to predict unseen data than the intercept-only model (i.e., via loo-cv). Given the above, I was thinking that I wouldn't want to include both pool and DWT in a single model (and thus not in a global model). Similarly, I don't have reason to believe that the cover of trees, shrubs, ferns, forbs, and graminoids are all simultaneously needed to understand why P. uviferum seedlings may be more abundant in one microsite than another. Only the simple candidate model that included shrubs yielded at least an 80% probability of a "non-zero" marginal effect of the veg life form. I was thinking that I wouldn't learn anything new by fitting a model that included all of the veg life forms, both predictors that capture water levels, plus the other predictors that I have strong a priori reasons to believe are associated with seedling counts. Do I need to adjust my thinking?

ebuhle commented 1 year ago

Thanks, that's very helpful. I'm still missing one step, though: how are the slightly more complex candidate models constructed from the results of the simplest ones? Do you have one SMCCM with all the predictors that passed the initial univariate (or bivariate with patch) screening, and then another one with all the losers?

I suppose the reason you don't want to include all the potentially "nonsignificant" predictors is because of the variance-bias tradeoff mentioned earlier as a limitation of the Gelman / Hilborn kitchen sink approach. Do you know if it's actually a big problem here, i.e. if you fit a truly global model do you get substantively different or weaker inferences on the slopes? One reason you might, besides irrelevant predictors adding to posterior uncertainty, is multicollinearity among variables measuring the same habitat dimension. But that cuts both ways, because multicollinearity can mask effects of predictors that would appear "nonsignificant" by themselves.

This gets back to what I wanted to say about model selection vs. model averaging. Sorry I haven't had the bandwidth to make this rigorous, but the gist is that if you are mainly interested in causal inference rather than prediction / forecasting, IMO it makes more sense to go the Gelman continuous model expansion route rather than fiddle around with PSIS-LOO and stacking weights and model-averaged coefficients with all their interpretive caveats -- that is, as long as you have enough data to fit the global model.

If I were approaching this problem de novo, I would consider a structural equation modeling approach, with latent variables to represent environmental axes ("water level" with pool and DWT as manifest variables, "vegetation" with cover of each growth form, etc.). Too late for present purposes, but maybe future goals. :+1:

kzaret commented 1 year ago

I'm still missing one step, though: how are the slightly more complex candidate models constructed from the results of the simplest ones? Do you have one SMCCM with all the predictors that passed the initial univariate (or bivariate with patch) screening, and then another one with all the losers?

The first (and perhaps only kosher) MCCM (model 18a in the table below) included the "winning" predictors (see model numbers in bold). I think that what I was trying to do with the other MCCMs (but this was so 2022) was 1) compare results between a model that included the interaction between Sphagnum and patch and one that included only Sphagnum, and 2) see what happened as I removed predictors from that initial MCCM. It seems like my choice of predictors for the first MCCM was more along the lines of model expansion. However, I haven't been able to get the truly global model (model 19) to run. There are over 1200 divergent transitions, 2500 transitions that exceeded the maximum tree depth after warm up, evidence the chains didn't mix, the Bulk and Tail ESS were too low, the Rhat to high, etc. I started with stepsize = 0.01 and I know (from https://mc-stan.org/misc/warnings.html) that I could adjust that, the adapt_delta, tree depth, etc. But with all the above issues, could those adjustments actually make a difference? I’m not comfortable with the MCCM as my “final answer”.

If I were to pursue the model stacking route, it seems like I’d want to make a different choice of MCCM to be more consistent in my approach. That is, I could specify a MCCM using the predictors from models that are substantially better able to predict unseen data than the intercept-only model based on the elpd_diff values (i.e., models 12b and higher in the table). Then, I could derive stacking weights for those plus the MCCM and interpret the stacked model results. (The current stacking weights are based on all of the SCMs that aren't crossed off and the first MCCM fit to date.)

Your thoughts are most welcome.

image

ebuhle commented 1 year ago

However, I haven't been able to get the truly global model (model 19) to run. There are over 1200 divergent transitions, 2500 transitions that exceeded the maximum tree depth after warm up, evidence the chains didn't mix, the Bulk and Tail ESS were too low, the Rhat to high, etc. I started with stepsize = 0.01 and I know (from https://mc-stan.org/misc/warnings.html) that I could adjust that, the adapt_delta, tree depth, etc. But with all the above issues, could those adjustments actually make a difference?

That does sound pretty bad. I imagine all those interactions with patch aren't helping your cause; have you tried a model with just the main effects? (I realize the interactions represent scientific hypotheses, but this is partly a diagnostic suggestion.) Going back to multicollinearity again, if it's strong enough that could be another source of numerical trouble. How strong are the (presumably negative) correlations among the vegetation cover classes? I gather they don't sum to 1, but if they even come close that would obviously produce an ill-conditioned design matrix. Another way to diagnose this is to look at an mcmc_pairs() plot of the betas.

As for the NUTS parameters, initial stepsize is by far the least influential. I would definitely try increasing adapt_delta from its default of 0.8 if you haven't already done so -- try 0.95. That in turn will require the sampler to take even more, smaller steps, so you'll also want to increase max_treedepth -- start with, say, 12.

kzaret commented 1 year ago

I still haven’t tried fitting a model with just the main effects, though I did adjust the NUTS parameters as you suggested and there are still major/many issues (and it took more than four hours to run). Also, I realized that the global model should actually include 24 rather than 15 parameters. (This is because I thought some of the candidate models had structural issues, but subsequent pp checks suggested that wasn’t so). Many of my predictors are highly correlated (see the correlogram). I think this means that, if I want to include more predictors in my “final model” or additional MCCMs, I would need to continue to expand using just predictors that don’t strongly correlate with those currently included. But maybe I could just stop here with model 18f?

I did something incorrectly/inconsistently when choosing predictors for the MCCMs in the table posted previously. The predictors included in models 18e and 18f (in the new table) were based on whether the parameter estimates for the predictors in the SCMs excluded zero with at least 80% or 90% probability, respectively. From the elpd diff results, model 18e is significantly worse than the intercept model (and all the other candidate models and 18f) at predicting unseen data. Thus, if I were choosing between the two MCCMs, I’d go with 18f. However, from the table, we can also see that while model 18f performs better than the intercept-only model, so do some of the other SCMs that include predictors not in 18f. Should I care about that given that I’m not really interested in being able to predict the number of seedlings across a given suite of microsites? Rather, I want to know which elements of a landscape patch are more important for P. uviferum regeneration and thus which altered patches may be better suited to passive or active restoration or which ones seem to have transitioned away from suitable P. uviferum habitat.

I’m feeling somewhat panicked about wrapping up these results so that I can move on to two or three other chapters (and hopefully wrap up the Big D by the end of the academic year). What would you recommend as a next step for a intro-modeler who's running out of time?

image

image

kzaret commented 1 year ago

Actually, let's scrap the above comment/questions. I worked on expanding MCCM 18f with predictors that were minimally correlated and think that I can justify my iterative process and "final answer". Vamos a ver. We shall see what my advisor, committee and/or reviewers think. Thanks for all of your insight and input to date!

kzaret commented 1 year ago

In my apparently typical fashion of doing things backwards, sideways and upside down, I fit the following model (among the many others you've seen) and was merrily working on interpreting my results when I had a question that led to some new "data exploration" and now I'm concerned that I screwed up (again).

Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ DWT + log_PAR1.3.rs + Plot_PIUVcnt.rs + Sphagnum.rs + Cushion_spp.rs + OtherShrubs.rs + patch + DWT:patch + log_PAR1.3.rs:patch + OtherShrubs.rs:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.20      0.14     0.01     0.52 1.00     3257     4578

Population-Level Effects: 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                         4.27      1.21     1.91     6.61 1.00     3863     4595
shape_Intercept                   0.87      0.27     0.33     1.39 1.00    10393     8295
DWT.L                             0.19      2.46    -4.70     4.96 1.00     3708     4511
DWT.Q                            -0.43      1.43    -3.22     2.39 1.00     3735     4748
log_PAR1.3.rs                     0.38      0.18     0.01     0.73 1.00    11106     6602
Plot_PIUVcnt.rs                  -0.10      0.20    -0.48     0.29 1.00     9667     7160
Sphagnum.rs                       0.12      0.14    -0.15     0.40 1.00    12779     8270
Cushion_spp.rs                    0.46      0.20     0.07     0.88 1.00    10985     7336
OtherShrubs.rs                   -0.37      0.13    -0.63    -0.10 1.00    10047     6871
patchBogForest                   -1.47      1.26    -3.93     0.94 1.00     4000     5554
patchBog                         -5.14      2.59   -10.36    -0.04 1.00     7132     6321
DWT.L:patchBogForest             -0.81      2.47    -5.64     4.08 1.00     3729     4763
DWT.Q:patchBogForest              0.20      1.44    -2.64     3.00 1.00     3777     5155
DWT.L:patchBog                   -2.78      3.29    -9.43     3.34 1.00     6289     7043
DWT.Q:patchBog                   -1.52      2.26    -6.25     2.60 1.00     5079     6585
log_PAR1.3.rs:patchBogForest     -0.67      0.39    -1.47     0.09 1.00    11510     7352
log_PAR1.3.rs:patchBog           -1.40      0.52    -2.46    -0.44 1.00     9187     6507
OtherShrubs.rs:patchBogForest     0.08      0.27    -0.44     0.61 1.00    10362     7738
OtherShrubs.rs:patchBog           1.43      3.74    -5.83     8.82 1.00    11402     6566
shape_patchBogForest              0.13      0.44    -0.73     1.00 1.00    11104     8250
shape_patchBog                   -1.37      0.66    -2.58    -0.00 1.00     8019     6684

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

DWT:patch is an interaction between two categorical predictors. Patch is nominal (forest, bog forest, bog), but I coded DWT as an ordinal variable with three levels (low, medium, high). When I summarize the data, it turns out there are no subplots in the forest patch that qualify as having 'low DWT'. So, what is the intercept (which I think represents 'forest, low DWT') actually telling me? And, should DWT instead be coded as a nominal factor?

ebuhle commented 1 year ago

Boy, you do have a knack for finding "interesting" edge cases!

OK, so first off it doesn't matter whether DWT is coded as ordinal; this is a fundamental nonidentifiability issue that only manifests differently depending on the factor codings. The problem is you are trying to estimate a set of DWT:patch coefficients that are not all jointly identified by the likelihood. In classical parlance, you have more parameters than degrees of freedom. One cell of the 2-way table contains no observations, i.e. the factors are not fully crossed, so the design matrix is singular. Likewise it doesn't matter whether the missing cell involves the reference factor level(s).

Consider a toy example:

library(rstanarm)
library(bayesplot)

df <- data.frame(x1 = factor(rep(LETTERS[1:3], each = 15)),
                 x2 = factor(rep(c("L","M","H"), 15), levels = c("L","M","H")),
                 y = rnorm(45))
df <- subset(df, x1 != "A" | x2 != "L")
table(df$x1, df$x2)
    L M H
  A 0 5 5
  B 5 5 5
  C 5 5 5
lm1 <- lm(y ~ x1*x2, data = df)
summary(lm1)
Call:
lm(formula = y ~ x1 * x2, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.3377 -0.3963 -0.1875  0.5195  2.1666 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.16863    0.65185   0.259    0.798
x1B         -0.63203    0.75269  -0.840    0.407
x1C         -0.07368    0.53223  -0.138    0.891
x2M         -0.68885    0.75269  -0.915    0.367
x2H          0.43301    0.53223   0.814    0.422
x1B:x2M      0.74507    0.92185   0.808    0.425
x1C:x2M      0.69877    0.75269   0.928    0.360
x1B:x2H      1.14582    0.75269   1.522    0.138
x1C:x2H           NA         NA      NA       NA

Residual standard error: 0.8415 on 32 degrees of freedom
Multiple R-squared:  0.3478,    Adjusted R-squared:  0.2052 
F-statistic: 2.438 on 7 and 32 DF,  p-value: 0.04025

Unexpectedly to me, lm() doesn't choke on the singular design matrix. It's smart enough to drop one (arbitrary) dummy contrast coefficient from the interaction term and salvage the rest. How to interpret the resulting model is another matter.

You can verify that the same thing happens if x2 is coded as ordinal:

lm2 <- lm(y ~ x1*ordered(x2), data = df)
summary(lm2)

Now, in the Bayesian context things get squishier. No parameter with a proper prior is ever truly nonidentifiable. You can check that stan_lm(), which uses lm() internally, has the same behavior of dropping one interaction contrast -- the difference is that it does so silently, without the informational message!

slm1 <- stan_lm(y ~ x1*x2, data = df, prior = NULL)
summary(slm1)

On the other hand, stan_glm() (and stan_glmer()) tries to estimate everything, and this is where things get interesting.

sglm1 <- stan_glm(y ~ x1*x2, data = df)
summary(sglm1)
Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      y ~ x1 * x2
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 40
 predictors:   9

Estimates:
              mean   sd   10%   50%   90%
(Intercept)  0.5    2.3 -2.4   0.4   3.4 
x1B          0.5    2.3 -2.4   0.5   3.4 
x1C         -0.2    2.3 -3.2  -0.2   2.7 
x2M          0.0    2.3 -3.0   0.0   3.0 
x2H         -0.6    2.3 -3.7  -0.6   2.3 
x1B:x2M     -1.4    2.4 -4.4  -1.4   1.6 
x1C:x2M     -0.4    2.4 -3.4  -0.4   2.7 
x1B:x2H      0.4    2.4 -2.6   0.4   3.4 
x1C:x2H      1.1    2.4 -2.0   1.0   4.1 
sigma        1.1    0.1  0.9   1.1   1.3 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.2  0.0   0.3   0.6  

The mean_ppd is the sample average posterior predictive distribution 
of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.1  1.0   502 
x1B           0.1  1.0   500 
x1C           0.1  1.0   517 
x2M           0.1  1.0   507 
x2H           0.1  1.0   509 
x1B:x2M       0.1  1.0   504 
x1C:x2M       0.1  1.0   531 
x1B:x2H       0.1  1.0   508 
x1C:x2H       0.1  1.0   547 
sigma         0.0  1.0  1539 
mean_PPD      0.0  1.0  2771 
log-posterior 0.1  1.0  1004 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure 
of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

There are no obvious signs that anything is wrong here. The effective sample sizes of the factor coefficients are suspiciously low, but in a realistic complex model you might or might not notice that. If you were able to compare the stan_lm() results based on a full-rank design matrix, you'd see that the uncertainty is much greater in the stan_glm() fit, but in practice you wouldn't have a reference point like that. The clearest tell seems to be the bivariate posterior correlations:

mcmc_pairs(sglm1, regex_pars = ":")

Or you could just, you know, notice that you have a rank-deficient design matrix and thus a nonidentifiable factor interaction term. :wink: You can emulate the lm() behavior manually by removing one (arbitrary) contrast term from the formula, e.g. - DWT.Q:patchBog. Like I said, though, the resulting model is a bit harder to interpret.

I wonder if this has been the source of some of the weird "weak identifiability" issues we've been trying to solve with priors throughout this thread???

kzaret commented 1 year ago

Thanks for working through the toy example. DWT:patch was included in a few of the models we'd been looking at, so maybe. . . in terms of the "weak identifiability".

I created the categorical DWT variable from quantitative data. I could adjust how I "break" those data into categories so that 'low' is more inclusive and there are actually observations across all combinations of levels?

ebuhle commented 1 year ago

I created the categorical DWT variable from quantitative data. I could adjust how I "break" those data into categories so that 'low' is more inclusive and there are actually observations across all combinations of levels?

I was wondering why you discretized this continuous variable -- usually a bad idea -- but I assumed it was because you weren't confident in the measurements for reasons I vaguely recall discussing before. Even so, it might be better to keep it continuous. After all, measurement error will induce errors in the discretized categories too.

kzaret commented 1 year ago

I think my options are to keep it continuous or exclude it from the analysis. DWT was measured at three consistent locations across a subplot and represented the distance from the top of whatever substrate would have been first hit when dropping a pin from DBH (~1.3 m) down to the water table; distances were recorded as negative values if the substrate was below the water table (i.e., if there was surface water), otherwise as positive values.

Andres and the post-doc in the lab seemed to think I couldn't/shouldn't use DWT as a continuous predictor because what it represents varies strongly by patch (due to the distinct types of substrates in each; like downed logs in the forest patch vs. Sphagnum lawns in the bog patch). Also, I didn't measure a "reference level" (e.g., distance from the ground surface to water table). (I challenge anyone to consistently define "ground" in at typical peatland or in those horizontal P. uviferum forests where you suddenly find yourself stepping down a meter when you thought you'd been walking on the forest "floor".) Like you pointed out, though, any wonkiness in interpretation still exists in the discretized version of the variable.

Here are boxplots of the average DWT per patch. I'm not surprise by anything. Does my explanation of how I measured DWT raise red flags for you? I've also included photos of subplots. The long PVC pipe is the diameter of the circular subplot; I would flip the short PVC pipe to the other side to visualize the boundaries of and sample the other half of the plot.

image

image

ebuhle commented 1 year ago

We really should have started a new Issue at some point. :roll_eyes:

I think my options are to keep it continuous or exclude it from the analysis.

I agree. Or not interact it with patch.

Andres and the post-doc in the lab seemed to think I couldn't/shouldn't use DWT as a continuous predictor because what it represents varies strongly by patch (due to the distinct types of substrates in each; like downed logs in the forest patch vs. Sphagnum lawns in the bog patch).

Like you pointed out, though, any wonkiness in interpretation still exists in the discretized version of the variable.

Took the words right out of my mouth! I certainly see their point, but if anything discretizing only muddies the bog water IMO. Maybe they were suggesting DWT should be split into low, medium and high within each patch? Except that's obviously not what you did, and even if that were the move, you could just scale the measurements. It sounds more like they're saying the measurement method itself lacks external validity, in which case... :man_shrugging: Ultimately the question is what index of water table depth the trees are responding to, right? I'm just a country statistician, but I would have thought this was something like distance from the root-shoot boundary to the water surface.

Also, I didn't measure a "reference level" (e.g., distance from the ground surface to water table). (I challenge anyone to consistently define "ground" in at typical peatland or in those horizontal P. uviferum forests where you suddenly find yourself stepping down a meter when you thought you'd been walking on the forest "floor".)

LOL yeah, that's some Hound of the Baskervilles stuff there.

kzaret commented 1 year ago

It sounds more like they're saying the measurement method itself lacks external validity, in which case... 🤷‍♂️ Ultimately the question is what index of water table depth the trees are responding to, right? I'm just a country statistician, but I would have thought this was something like distance from the root-shoot boundary to the water surface.

One of the driving ideas was that young/small seedlings may desiccate, even in the wet bog patch, during the drought season because their roots are too far from the water table given the height/thickness of the Sphagnum lawns. Yet, young seedlings are also likely to die from lengthy surface inundation. I was thinking of the "bird's eye" view of subplots as including the available substrates on which seeds could land and germinate. The substrates differ from one another in their composition (e.g., log, moss mat, pool of water) but also biophysical factors like distance to the water table.

I'm leaning toward including the measurement and trying to give readers enough information to understand its limitations and decide for themselves whether it's valid.

ebuhle commented 1 year ago

One of the driving ideas was that young/small seedlings may desiccate, even in the wet bog patch, during the drought season because their roots are too far from the water table given the height/thickness of the Sphagnum lawns. Yet, young seedlings are also likely to die from lengthy surface inundation.

Ah, so a unimodal relationship, perhaps? Hutchinsonian niche something something.

I was thinking of the "bird's eye" view of subplots as including the available substrates on which seeds could land and germinate. The substrates differ from one another in their composition (e.g., log, moss mat, pool of water) but also biophysical factors like distance to the water table.

OK, this makes sense if seedlings do establish on all those substrates (except standing water, presumably). In that case your method is basically a proxy for distance from the root-shoot boundary to the water table.

I'm leaning toward including the measurement and trying to give readers enough information to understand its limitations and decide for themselves whether it's valid.

I agree. You've convinced the most important demographic, someone who knows absolutely nothing.

If you do interact it with patch, though, given the very different ranges of values in Bog vs. the others, be extra careful with posterior predictive checking (e.g. marginal plots of response vs. DWT). Or even if you don't interact it, for that matter.

kzaret commented 1 year ago

I agree. You've convinced the most important demographic, someone who knows absolutely nothing.

Oh good[y goody gumdrops].

If you do interact it with patch, though, given the very different ranges of values in Bog vs. the others, be extra careful with posterior predictive checking (e.g. marginal plots of response vs. DWT). Or even if you don't interact it, for that matter.

What would you be worried about seeing?