kzaret / RQ2_Dendro_v2_PIUVestab

0 stars 0 forks source link

Developing informative priors on tree age: overview and coring-height corrections #2

Open ebuhle opened 3 years ago

ebuhle commented 3 years ago

@kzaret, thanks for the very helpful PowerPoint slides breaking down the sources of ring-count observation error. This tracks with my understanding of the three basic error components and fleshes out the details considerably. Let's take each one from the modeling perspective, specifically with the goal of developing an informative prior on the age of each tree i (a[i] = T - t[i], in the notation of the vignette).

This may seem somewhat abstract at the moment (I plan to illustrate it in the next revision) but I think the cleanest way to formulate the age-uncertain Poisson state-space model for your problem is to "split off" the age observation model. That is, rather than explicitly representing the likelihood of the "observed" establishment year t[i] given the true year tau[i] and auxiliary parameters as in the current version of the vignette -- an observation likelihood that would have to be much more complex than my toy example, involving covariates such as coring height and so on -- it makes more sense to think of the observation model as the whole series of steps described in your slides. Those observation submodels jointly define a posterior distribution on establishment year tau[i], given the data (e.g., cross-section data, measured core heights, inner ring measurements). That posterior can then be used as an informative prior on tau[i] in the state-space model, where it will be updated based on information from the process model (perhaps including climate forcing, for example).

So the desired output of the ring-count corrections is a posterior distribution on establishment year (or equivalently age) of each individual, incorporating all sources of ring-count observation error. By definition, the distribution must be discrete. And things will be simpler if we use Bayesian inference where possible.

(1) Measurement of coring height (2) Estimates of the number of tree-rings not captured by tree cores that were extracted above the root/shoot boundary

I'm grouping these under the heading of coring-height correction. This is the most straightforwardly quantifiable part, though not without known unknowns.

Your Gaussian linear regressions of additional ring count vs. coring height confirmed my suspicion that these should be Poisson regressions. The data are counts; moreover they show the classic mean-variance funnel, and the conditional distribution at a given height looks right-skewed at least in some cases, though it's hard to tell. However, the diagram (and rather amazing photo) on slide 4 gives me the impression that the theoretical expectation based on tree growth patterns is a linear relationship between coring height and unobserved rings, and indeed the empirical relationships look linear. So it'll have to be a Poisson GLM with identity link function, fixing the intercept to zero (although in the Forest patch there seems to be a point at zero height with nonzero additional rings??).

A further wrinkle is that you have data from roughly 100 harvested trees, each of which likely has a slightly different height-ring count relationship, so fully pooling the data is problematic. The obvious solution is a hierarchical model (a Poisson GLMM) with a random tree-level slope. This can be very easily done in rstanarm. One might even wonder whether there are autocorrelated anomalies along the series of successive cross-sections within each tree; rstanarm can't handle this, but brms can.

Then you have your known unknowns. First, uncertainty in the measured coring height itself. In principle, this could be modeled with the "errors-in-variables" approach; brms can handle this, essentially by treating the true x values (e.g., coring height) as latent variables and giving them informative priors. Do you have any way of quantifying these errors -- multiple observers recording the same height, etc.?

Second and less tractable, the suspected violation of the structural assumption that the relationships in the sample of young trees apply to older trees. Not sure what to tell you about that one.

(3) Estimates of the number of tree-rings not captured by tree cores that missed the pith

This one is interesting because it's not a statistical model, just a geometric formula. Duncan (1989) notes many potential sources of structural model misspecification error, from imperfect concentricity to missing rings depending on radius length to whatever "wedging" is. But at least in his sample, by far the greatest errors came from the estimation of ring width. This error has both a bias and a variance component; the former is a classic known unknown, but the latter could be quantified based on the variation in the sample of three innermost ring widths that are averaged. Except...it looks like you didn't actually record the individual widths, just divided the total by 3? I'm sure that seemed very efficient and clever at the time. :sob:

The other route would be, as Duncan (1989) did, to estimate the error CV by comparing his predictions to the "truth" based on cross-sections. Which you could do (or could have done). Have you, or Andres or anyone else, done this for PIUV?

In any event, I think the "ring-to-pith upper threshold" thing is bad business. Ideally, the larger estimates would come with greater absolute uncertainty (as would be the case if you applied Duncan's method of estimating the error CV) that would smear out the posterior and downweight the data point in subsequent analysis. If you just declare it a "minimum age", how does that help you? Seems like then you're stuck making up some range of uncertainty based on Expert Opinion or something.

Same goes for...

(4) Tree cores missing the outermost rings (due to core damage during collection, transport or processing, or due to weathering in the case of dead trees)

This would be an issue regardless of analysis method, though. Even if you were going to round age to the nearest 20 or 50 yr, how would you do that if you only have a minimum estimate?

Did you do any crossdating? I mention it mostly to show off something I learned about dendrochronology (via geoducks!) but also because it seems tailor-made for situations like your snags and lost outermost rings. (I guess it doesn't help with the ring-to-pith problem.)

This topic of dead snags brings up the elephant in the room, but that's a whole 'nother issue, literally.

kzaret commented 3 years ago

Thanks for all of the feedback, suggestions and questions. Here are some responses:

(1, 2) Coring height corrections (i.e., estimates of rings not captured due to coring height, measurement of coring height)

Regarding the issue of initial measurements of coring height: Do we have any way of quantifying these errors?

Forest Patch: I don’t think that we can quantify these errors. I know that the range of uncertainty for trees with vertical growth habits is less than that of trees with horizontal growth habits, but I don’t really know the bounds: our estimates of the coring heights of the horizontal trees could be anywhere from 30 (?) to 200 (?) cm off. Burned Forest patch: I have repeat measurements of substrate (including Sphagnum) depth at set intervals (every 2 m?) along the centerline of each plot. I also have measurements (four per plot) of depth to mineral soil. Given the observation that large Pilgerodendron roots are often close to the surface of mineral or compacted organic soil, perhaps these depths can be used to estimate observation error? That is, they could give us ranges in values of the lengths of Pilgerodendron trunks located below the substrate surface and thus below coring height?

(3) Estimates of the number of tree-rings not captured by tree cores that missed the pith

Re quantifying the variance component of the error in estimates of ring width using Duncan (1989):

Option 1: Based on variation in the sample of three innermost ring widths per core. I did record the ring width of each of the three innermost rings, but if we were to pursue ring width variance this way, would it be better to use more rings? How many? 20 based on Duncan's results?

Option 2: Estimate the error CV by comparing predictions to the "truth" based on cross-sections as in Duncan (1989). Nobody has done this for Pilgerodendron that I know of. I suppose Andres has fire scar cross sections from older trees, though they are not from the same patch types that I'm investigating. I would suggest using tree cores that reach the pith except I have none from the Forest patch. I'm not sure I'd want to use the harvested cross sections since I don't know how well their patterns of ring width variation match those of the older trees from which I extracted the tree cores.

(4) Tree cores missing the outermost rings

Did you do any cross-dating? Sigh. About a year ago, I gave up on cross-dating because it is a labor-intensive, maddening process given the micro-rings and eccentricity of Pilgerodendron. At the time, since I didn't seem to be finding patterns while visually cross-dating (i.e., taking note of abnormally small or abnormally wide ring years in each core), and since it seemed we were going to be binning the count data anyway, I let it go. Now, with you on board and the renewed hope of getting to work with annual resolution data, I am willing to give cross-dating another go for samples from the Burned Forest patch (but I'm also wringing my hands and weeping inside).

(5) The elephant in the room is on my list to discuss with Andres at our next meeting (Tues, 1/19). Speaking of meetings, would you be game to meet with Andres and I at some point?

kzaret commented 3 years ago

With regard to the initial coring height measurements in the Forest patch: only 12 of 41 cores were collected from "horizontal" trees. I could leave those out of the analysis. . . .

ebuhle commented 3 years ago

(1, 2) Coring height corrections (i.e., estimates of rings not captured due to coring height, measurement of coring height)

Regarding the issue of initial measurements of coring height: Do we have any way of quantifying these errors?

Forest Patch: I don’t think that we can quantify these errors. I know that the range of uncertainty for trees with vertical growth habits is less than that of trees with horizontal growth habits, but I don’t really know the bounds: our estimates of the coring heights of the horizontal trees could be anywhere from 30 (?) to 200 (?) cm off.

Sigh. I figured this would be the case. I suppose the principled Bayesian thing to do would be to turn these intuitions / educated guesses into informative priors, and then check the sensitivity of the results to those priors. The unscrupulous Bayesian would just ignore the uncertainty in coring height, which would certainly make life easier. I'm not disclosing which camp I belong to.

I would definitely advise against throwing out the 12/41 "horizontal" trees, or throwing out data in general. Already, my biggest concern about the practical feasibility of this modeling approach is that the data may be too sparse to clearly reveal environmental signals. You can get some crude intuition about this by looking at the simulated sample sizes I used in the vignette.

Burned Forest patch: I have repeat measurements of substrate (including Sphagnum) depth at set intervals (every 2 m?) along the centerline of each plot. I also have measurements (four per plot) of depth to mineral soil. Given the observation that large Pilgerodendron roots are often close to the surface of mineral or compacted organic soil, perhaps these depths can be used to estimate observation error? That is, they could give us ranges in values of the lengths of Pilgerodendron trunks located below the substrate surface and thus below coring height?

This seems promising. Again, though, I'd be reluctant to quantify the coring-height uncertainty for one patch but not the other(s), if only because it would add gratuitous logistical and expository complexity.

(3) Estimates of the number of tree-rings not captured by tree cores that missed the pith

Re quantifying the variance component of the error in estimates of ring width using Duncan (1989):

Option 1: Based on variation in the sample of three innermost ring widths per core. I did record the ring width of each of the three innermost rings, but if we were to pursue ring width variance this way, would it be better to use more rings? How many? 20 based on Duncan's results?

Oh, nice! Well, this would be better than doing nothing, even though Duncan's results indicate that bias (from systematic changes in ring width with age) is a bigger source of error than variance. And yes, using the innermost 20 rings for this purpose makes sense.

Here again, I'd caution against throwing out cores that missed the pith, even in patches where "pithy" cores are available. We'll need all the data we can get.

(4) Tree cores missing the outermost rings

Did you do any cross-dating? Sigh. About a year ago, I gave up on cross-dating because it is a labor-intensive, maddening process given the micro-rings and eccentricity of Pilgerodendron. At the time, since I didn't seem to be finding patterns while visually cross-dating (i.e., taking note of abnormally small or abnormally wide ring years in each core), and since it seemed we were going to be binning the count data anyway, I let it go. Now, with you on board and the renewed hope of getting to work with annual resolution data, I am willing to give cross-dating another go for samples from the Burned Forest patch (but I'm also wringing my hands and weeping inside).

Alas, I had a feeling you'd say this. :cry: In the interest of minimizing weeping and handwringing, let's table this for now but keep it in mind as a nuclear option. That said, I don't have any other clever ideas for how to handle cores that are missing the outermost rings, and (again) I would hate to just discard them.

(5) The elephant in the room is on my list to discuss with Andres at our next meeting (Tues, 1/19). Speaking of meetings, would you be game to meet with Andres and I at some point?

Interested to hear the outcome of this discussion, even though I haven't yet managed to open an issue outlining the problem as I see it. And yes, I'd be game to meet with you and Andres at some point...though I should warn you, my Zoom game leaves a lot to be desired.

kzaret commented 3 years ago

(1, 2) Coring height corrections (i.e., estimates of rings not captured due to coring height, measurement of coring height) Regarding the issue of initial measurements of coring height: Do we have any way of quantifying these errors? Forest Patch: I don’t think that we can quantify these errors. I know that the range of uncertainty for trees with vertical growth habits is less than that of trees with horizontal growth habits, but I don’t really know the bounds: our estimates of the coring heights of the horizontal trees could be anywhere from 30 (?) to 200 (?) cm off.

Sigh. I figured this would be the case. I suppose the principled Bayesian thing to do would be to turn these intuitions / educated guesses into informative priors, and then check the sensitivity of the results to those priors. The unscrupulous Bayesian would just ignore the uncertainty in coring height, which would certainly make life easier. I'm not disclosing which camp I belong to.

At some point before my meeting with Andres last week, I woke up sweating, nearly convinced that my data in the Forest and Burned Forest patches are useless because coring height is so pivotal to estimating total ring counts and yet I didn't fully get this at the time of sampling nor take the minimum measures possible to reduce uncertainty in this variable (e.g., always coring as low to the ground as possible). Also, I may have ignorantly made things worse with some inconsistencies in data collection protocols: in 2018, I had folks take note of the below-ground trunk lengths that they were estimating and adding to their directly observed measurements of above-ground trunk lengths in the Burned Forest patch. But, not everyone did this for every core in 2018 and we did not do this in 2017, nor was it done for any other patches. And, I just -- finally -- found a note saying that I added 110 cm to all tree heights in the Burned Forest patch in 2017, yet I also noted that if it was not possible to core at 1.3 m in height, we would do so at 1.9 m height. How could we have known that we couldn't core at 1.3 m if we didn't make some estimate of what was below the surface?! Arrgghh. I will spend some time with the raw data in my field notebooks. . . .

Since we can't ignore uncertainty in coring height, but we can identify the cores/patches in which coring height uncertainty is much less, what if the analysis were to focus on just those patches? What you write in the next quotes about sample size suggests that this wouldn't be feasible/advisable. I do have some reasons to believe that getting at the establishment ~ climate dynamics in the transitional patches (Bog Forest, Transition) -- the patches that are ecotonal between non-forested/open bog and forest -- would offer some good insight into the persistence of these boundaries (which is an overarching project goal).

I would definitely advise against throwing out the 12/41 "horizontal" trees, or throwing out data in general. Already, my biggest concern about the practical feasibility of this modeling approach is that the data may be too sparse to clearly reveal environmental signals. You can get some crude intuition about this by looking at the simulated sample sizes I used in the vignette.

My crude intuition suggests that burying my head in the Sphagnum would be an appropriate next move.

Burned Forest patch: I have repeat measurements of substrate (including Sphagnum) depth at set intervals (every 2 m?) along the centerline of each plot. I also have measurements (four per plot) of depth to mineral soil. Given the observation that large Pilgerodendron roots are often close to the surface of mineral or compacted organic soil, perhaps these depths can be used to estimate observation error? That is, they could give us ranges in values of the lengths of Pilgerodendron trunks located below the substrate surface and thus below coring height?

This seems promising. Again, though, I'd be reluctant to quantify the coring-height uncertainty for one patch but not the other(s), if only because it would add gratuitous logistical and expository complexity.

Yeah, I would be uncomfortable writing/saying, "Well, we thought coring height mattered for this part of the system but not this other part". Logistically, though: each patch can have its own (sub)model of coring-height uncertainty? What about different groups/classes of cores within patches (e.g., the horizontal vs. vertical trunks in the Forest patch)?

I will need help thinking through how to make use of the substrate depths and below-ground trunk length estimations from the Burned Forest patch, if we indeed decide to work with tree cores from that patch.

(3) Estimates of the number of tree-rings not captured by tree cores that missed the pith Re quantifying the variance component of the error in estimates of ring width using Duncan (1989): Option 1: Based on variation in the sample of three innermost ring widths per core. I did record the ring width of each of the three innermost rings, but if we were to pursue ring width variance this way, would it be better to use more rings? How many? 20 based on Duncan's results?

Oh, nice! Well, this would be better than doing nothing, even though Duncan's results indicate that bias (from systematic changes in ring width with age) is a bigger source of error than variance. And yes, using the innermost 20 rings for this purpose makes sense.

Again, I will need some help understanding exactly how these data will be used. If I want to get started on this process, how should I set up my spreadsheets(s) for recording ring widths and eventually calculating variances? I was thinking one data table per patch, each of the first 20 innermost rings is a row, core IDs are the columns. Would I do this for a subset of the tree cores per patch?

Here again, I'd caution against throwing out cores that missed the pith, even in patches where "pithy" cores are available. We'll need all the data we can get.

Roger that.

(4) Tree cores missing the outermost rings Did you do any cross-dating? Sigh. About a year ago, I gave up on cross-dating because it is a labor-intensive, maddening process given the micro-rings and eccentricity of Pilgerodendron. At the time, since I didn't seem to be finding patterns while visually cross-dating (i.e., taking note of abnormally small or abnormally wide ring years in each core), and since it seemed we were going to be binning the count data anyway, I let it go. Now, with you on board and the renewed hope of getting to work with annual resolution data, I am willing to give cross-dating another go for samples from the Burned Forest patch (but I'm also wringing my hands and weeping inside).

Alas, I had a feeling you'd say this. 😢 In the interest of minimizing weeping and handwringing, let's table this for now but keep it in mind as a nuclear option. That said, I don't have any other clever ideas for how to handle cores that are missing the outermost rings, and (again) I would hate to just discard them.

These are notes from my last meeting with Andres when we talked about missing outer rings in the Burned Forest patch: Take a look at the 18 cores with pith but w/o outermost rings and the 5 cores with outermost rings but no pith. Are there distinct age cohorts? How do their DBH’s compare? Are there relatively similar growth patterns during the last 10-15 years? If above is similar across cores, can use the cores with outermost rings to estimate missing length on cores that didn’t reach pith. Also keep in mind: Did the 5 live trees establish after fire -- compare their ages/sizes to fire scar dates. Figure that most cores collected w/n at least a meter of each other along stems; and we wouldn’t expect more than 10-15 rings to be missing based on AH’s experience of PIUV’s weathering.

Note: I added a table to the ppt in which I summarized sample sizes and tree core characteristics per patch.

(5) The elephant in the room is on my list to discuss with Andres at our next meeting (Tues, 1/19). Speaking of meetings, would you be game to meet with Andres and I at some point?

Interested to hear the outcome of this discussion, even though I haven't yet managed to open an issue outlining the problem as I see it. And yes, I'd be game to meet with you and Andres at some point...though I should warn you, my Zoom game leaves a lot to be desired.

Thanks for Issue # 3.

Let's return to the elephant when we all meet. The long and short of the convo. with Andres was that we can write about this in the Discussion section of a manuscript. There are no longitudinal studies of naturally-recruited PIUV seedling survivorship. We do know that standing dead PIUV can remain in place for at least 150 years and I do have observations of stumps the patches in the burned/logged Lago Cute site. We don't have a way of knowing how many trees disappeared without a trace due to multiple rounds of burning.

My meetings with Andres are every other Tues. from 10 to 11:30 am. The next one is 2/2. What could work for you?

P.S. I have RStan set up and could try my hand at developing new additional-ring-counts ~ height along stem regressions (to estimate rings missed due to coring heights) per your suggestions. Are there reasons I shouldn't attempt to move forward on this for at least the Bog Forest and Transition patches where coring height measurements are relatively unproblematic ? (Again, there's a lot here that's still abstract to me and I don't yet understand how all the pieces will come together.)

Thank you!

ebuhle commented 3 years ago

I think we addressed most of these topics / Qs over email last night or in our meeting today (except the night sweats, I guess). A few further thoughts below.

And, I just -- finally -- found a note saying that I added 110 cm to all tree heights in the Burned Forest patch in 2017, yet I also noted that if it was not possible to core at 1.3 m in height, we would do so at 1.9 m height. How could we have known that we couldn't core at 1.3 m if we didn't make some estimate of what was below the surface?! Arrgghh. I will spend some time with the raw data in my field notebooks. . . .

I hesitate to ask, and it's not the most urgent thing given our road map, but have you gotten any insight into this?

Again, I will need some help understanding exactly how these data will be used. If I want to get started on this process, how should I set up my spreadsheets(s) for recording ring widths and eventually calculating variances? I was thinking one data table per patch, each of the first 20 innermost rings is a row, core IDs are the columns. Would I do this for a subset of the tree cores per patch?

That sounds complicated. I was thinking of just standard "wide" format, where columns are patch, tree, core, ring, width and rows are measurements. Then you could just do something like aggregate(width ~ patch + tree, FUN = sd, data = duncan). And you want every tree that required a ring-to-pith correction in there. Remind me, there's just one core per tree?

My crude intuition suggests that burying my head in the Sphagnum would be an appropriate next move.

Seems reasonable to me, given that you are in grad school. But in the meantime, if you wanted to gain some (very) crude intuition about how the time-uncertain Poisson state-space model behaves as sample size changes, you could go to this chunk in the script

https://github.com/kzaret/RQ1v2_PIUVestab/blob/6f129f4cb671218a13b78fb5e78dad6afe820f84/analysis/count_SS_uncertain_dates.R#L137-L142

and play around with N (make it, I dunno, 150), and refit the three models to the simulated data. You could even re-knit the vignette with the updated results, although the way it's written now you would have to manually delete /analysis/results/Poisson_SS.RData to force it to re-fit the models unless you had already done so from the script. (The only one that takes more than a minute is fit_tobs, at roughly 45 min on my laptop.)

ebuhle commented 3 years ago

One more thought. Last night I wrote, in relation to the Poisson GLMMs:

In a model with the intercept forced through zero, there's only one coefficient that can include "random effects" of tree ID. In Gelman terminology, this is a "random slope" (w.r.t. height) model. (In principle I suppose the patch term could include random tree effects, but that would just be a bit silly, wouldn't it.)

I should have noted that the only sensible way for patch effects to enter into the model is also by altering the slope of the relationship, not the intercept. So you would have something like

stan_glmer(add_rings ~ -1 + height + height:patch + (-1 + height | tree), family = poisson(link = "identity"), ...)
ebuhle commented 3 years ago

Actually, maybe what we should do with the ring widths is fit a parametric distribution to them (a kernel; lognormal or gamma being the obvious candidates). The posterior sample from the mean of that kernel can then be directly algebraically transformed into a posterior sample of Duncan ring-to-pith corrections using basic MCMC facts.

I initially balked at this idea because a distribution fitted to N = 3 observations is going to be pretty uncertain. But then I realized, duh, we just make it a hierarchical model across all the trees in the pithless sample. They won't all have the same mean inner ring width, obviously, but the tree-level means will be drawn from some hyperdistribution that also includes a fixed effect of patch. It would be really simple to fit this model in rstanarm, and a few more lines of code to get what we're after.

kzaret commented 3 years ago

One more thought. Last night I wrote, in relation to the Poisson GLMMs:

In a model with the intercept forced through zero, there's only one coefficient that can include "random effects" of tree ID. In Gelman terminology, this is a "random slope" (w.r.t. height) model. (In principle I suppose the patch term could include random tree effects, but that would just be a bit silly, wouldn't it.)

I should have noted that the only sensible way for patch effects to enter into the model is also by altering the slope of the relationship, not the intercept. So you would have something like

stan_glmer(add_rings ~ -1 + height + height:patch + (-1 + height | tree), family = poisson(link = "identity"), ...)

Why should link = "identity" rather than "log"? Also, trying something like the above, I got the error: "To use this combination of family and link the model must have an intercept".

ebuhle commented 3 years ago

Why should link = "identity" rather than "log"?

Because we want to model the mean additional ring count as a linear function of height above the root-shoot boundary, not an exponential function. Or as I wrote earlier, and Andres affirmed yesterday:

However, the diagram (and rather amazing photo) on slide 4 gives me the impression that the theoretical expectation based on tree growth patterns is a linear relationship between coring height and unobserved rings, and indeed the empirical relationships look linear. So it'll have to be a Poisson GLM with identity link function, fixing the intercept to zero (although in the Forest patch there seems to be a point at zero height with nonzero additional rings??).

Relatedly, the intercept of an exponential function cannot be zero, but the intercept of the additional rings vs. height relationship is zero by definition.

Also, trying something like the above, I got the error: "To use this combination of family and link the model must have an intercept".

Ugh, that's obnoxious. This appears to be an rstanarm restriction -- I found the line in the source code that throws that error, and the zero-intercept identity-link specification works fine in lme4::glmer().

Well...I guess we could try estimating the intercept and see what it comes up with. (Just drop the -1 term from the fixed effects formula, but not the random effects one b/c we don't want to include a tree-specific intercept.) If it's close to zero and fairly tight, then maybe it's good enough? Or we could add an artificial (0,0) observation for each tree to "strongly encourage" the fitted values to pass through the origin. (I'm not 100% sure that wouldn't cause a numerical error since the Poisson mean at the origin would be zero and the code might require it to be strictly positive. In that case we could add a microscopic jitter to the height, like 0.01 or something.)

EDIT: In principle a third option would be to put a very tight prior on the intercept restricting it to positive but negligible values. Unfortunately this is a hassle in rstanarm because the prior_intercept argument refers to the intercept after x has been internally centered, meaning you can't specify a prior on the "real" intercept independently of the slope(s).

kzaret commented 3 years ago

Here's the Poisson GLMM that I ran:

harv_glmer <- stan_glmer(formula = AddRings ~ Height_RC + Height_RC:Patch + (-1 + Height_RC | Sapling), data = harv, family = poisson(link = "identity"), iter=2000)

At the default number of iterations (2000), I got two warning messages -- 1: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess. 2: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess

When I increased the iterations to 3000, the model took 40 minutes to run, but I only got the first error. The model output didn't change noticeably from the following at two significant digits (I hadn't yet increased the output to three, lost the object when my computer restarted, and haven't run the model again at 3000):

image

The coefficients for the interactions of height along stem and patch match (at least in direction) what we see in the scatter plots of additional ring count by height with the linear regression lines superimposed:

image

Comparing posterior predicted simulations of the outcome to the observed data looks pretty good (to my eye):

yrep_harv <- posterior_predict(harv_glmer) n_sims <- nrow(yrep_harv) subset <-sample(n_sims, 100) ppc_dens_overlay(harv$AddRings, yrep_harv)

image

My first attempt at running posterior_predict with new predictor data (i.e., the the coring heights of my tree cores) did not work. This is what I tried:

cores <- read.csv("C:/Users/Marsh Hawk Zaret/Documents/Big_D/Data_Analysis/RQ1v2_PIUVestab/Data/PIUV_CoredProcessed.csv", header = TRUE)

#select & mutate predictor columns to match those used in the model cores.nd <- cores %>% filter(Patch2 != "Cushion") %>% #remove cores from Cushion patch select(Patch2, Individual, Cor_Height_cm) %>% mutate(Patch = Patch2, Sapling = Individual, Height_RC = Cor_Height_cm, .keep = "unused"

ynew <- posterior_predict (harv_glmer, newdata=cores.nd)

This is the error that I got: "Error in (function (fr, X, reTrms, family, nAGQ = 1L, verbose = 0L, maxit = 100L, : (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate". I'm just starting to explore google results, but if you have suggestions, I'd love to know them.

ebuhle commented 3 years ago

At the default number of iterations (2000), I got two warning messages -- 1: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess. 2: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess

When I increased the iterations to 3000, the model took 40 minutes to run, but I only got the first error. The model output didn't change noticeably from the following at two significant digits (I hadn't yet increased the output to three, lost the object when my computer restarted, and haven't run the model again at 3000):

40 minutes for 3000 iterations! I'm surprised the wall time is that slow, but you do have a large-ish data set. I find the ESS warnings are fairly conservative, and especially in a case like this where you're still iterating on the model, 2000 is probably fine. Of course you can see the ESS for each parameter with the summary() method and check if any are really small.

I don't love the "significantly" nonzero intercept estimate, but since it is substantively tiny I guess I'll learn to live with it.

Comparing posterior predicted simulations of the outcome to the observed data looks pretty good (to my eye):

That does look pretty good, although the model misses the lump around 40. You could try facet_wrap()ing the marginal PPD by patch to see where the discrepancy is coming from. Of course what we really need to see is the equivalent of your OLS regression plots, for which we need posterior_predict() to work.

That is one hell of an error message. I wonder if Ben Bolker's comment here is germane. The identity link, too, can of course go negative. One thing you could try is not conditioning on the tree-level random effects, i.e. generating predictions for the hyper-mean or "average tree" (re.form = NA), to rein in the posterior samples that might imply negative slopes for some tree(s).

But I would start by trying to generate posterior predictive regression lines a la your OLS plots. Something like

# "new" level of grouping factor, so PPD marginalizes over tree-level variance
# (predictions are for a "random tree")
newdata <- expand.grid(AddRings = 0, Height_RC = 1:round(max(harv$Height_RC),-1), Patch = unique(harv$Patch), Sapling = "0")  
ynew <- posterior_predict(harv_glmer, newdata = newdata)
ebuhle commented 3 years ago

Pro tip: use here, as Jenny Bryan exhorts here!

https://github.com/kzaret/RQ1v2_PIUVestab/blob/fd520b6d6288677caa6f3bb410efe810f61c3d8c/analysis/CorHeightAddRings.R#L24 https://github.com/kzaret/RQ1v2_PIUVestab/blob/fd520b6d6288677caa6f3bb410efe810f61c3d8c/analysis/CorHeightAddRings.R#L51

becomes

library(here)
...
harv <- read.csv(here("data", "RC_Height_cross_sections.csv"), header = TRUE)
cores <- read.csv(here("data", "PIUV_CoredProcessed.csv", header = TRUE)

(Apologies for being pedantic if you already knew this.)

ebuhle commented 3 years ago

40 minutes for 3000 iterations! I'm surprised the wall time is that slow

Ah! Well, you are running chains sequentially instead of in parallel, so it should only be 10 min total. (Also the default is chains = 4, but I usually use 1 less than the number of available cores so there's some CPU left for email etc.)

ebuhle commented 3 years ago

I find the ESS warnings are fairly conservative, and especially in a case like this where you're still iterating on the model, 2000 is probably fine. Of course you can see the ESS for each parameter with the summary() method and check if any are really small.

Actually the warnings were not oversensitive in this case. As the summary output shows, the intercept mixes well but all the slope hyper-means and several of the tree-level slopes have n_eff of only a few hundred (and low r_eff, expressed as a proportion of the posterior sample size of 4000).

MCMC diagnostics
                                   mcse Rhat n_eff
(Intercept)                        0.0  1.0  3311 
Height_RC                          0.0  1.0   236 
Height_RC:PatchBurned Forest       0.0  1.0   244 
Height_RC:PatchForest              0.0  1.0   326 
Height_RC:PatchTransition          0.0  1.0   258 
b[Height_RC Sapling:BF01_01]       0.0  1.0   539 
b[Height_RC Sapling:BF01_03]       0.0  1.0   736 
b[Height_RC Sapling:BF01_06]       0.0  1.0   718 
b[Height_RC Sapling:BF02_02]       0.0  1.0   768 
b[Height_RC Sapling:BF02_04]       0.0  1.0   916 

Interactive exploration in ShinyStan (shinystan::launch_shinystan(harv_glmer)) backs this up. Nice mixing for the intercept but some obvious long-period autocorrelation and marginal lumpiness for the height effect:

image

image

This is still fine for model-development purposes, but this model runs so fast that it wouldn't hurt to double the posterior sample size. BTW, these are runtimes in sec on my laptop for a run with chains = 3:

cbind(rstan::get_elapsed_time(harv_glmer$stanfit), total = rowSums(rstan::get_elapsed_time(harv_glmer$stanfit)))

        warmup sample  total
chain:1 48.182 28.058 76.240
chain:2 40.526 43.374 83.900
chain:3 36.006 26.751 62.757

Even accounting for the serial chains, it sounds like wall time on your machine is an order of magnitude longer. Have you configured your Makevars?

ebuhle commented 3 years ago

That is one hell of an error message. I wonder if Ben Bolker's comment here is germane. The identity link, too, can of course go negative. One thing you could try is not conditioning on the tree-level random effects, i.e. generating predictions for the hyper-mean or "average tree" (re.form = NA), to rein in the posterior samples that might imply negative slopes for some tree(s).

Confirmed that this works for both the gridded data (see below) and the cores data.

https://github.com/kzaret/RQ1v2_PIUVestab/blob/07a1d1f54c855aad0b74045d422e0812cbae126c/analysis/CorHeightAddRings.R#L72-L74

The problem, of course, is that we don't want the hyper-mean predictions (and if we did we could just use posterior_linpred). We want our predictions to marginalize over the tree-level heterogeneity. I haven't confirmed that the issue is the extra tree-level variation in slopes causing the expected values to go negative, but will investigate further. Really, all the slopes -- population- and tree-level -- ought to be constrained to be positive a priori. The simplest way, if we were coding this from scratch, would be to model them on the log scale, so the hyperdistribution is lognormal instead of normal. I'm not sure that's possible in rstanarm, but will look into it.

ebuhle commented 3 years ago

I added the posterior distribution of the linear predictor (i.e., the expected or "fitted" value of AddRings) to the scatterplots. The problem, again, is that these predictions are based on the hyper-mean slope estimates, setting all tree-level anomalies to zero (similar to the previous post, although I changed the object names to be more intuitive). Thus they underestimate the full uncertainty implied by the data. (The linear predictor by definition is less variable than the posterior predictive distribution, but as we've seen, posterior_predict() throws the same error.) This is particularly noticeable in the Bog Forest, which seems to have an especially high degree of tree-to-tree heterogeneity.

I still haven't definitively traced the error back to the expectation going negative, but the estimated tree-level slope SD provides a strong clue:

stan_glmer
 family:       poisson [identity]
 formula:      AddRings ~ Height_RC + Height_RC:Patch + (-1 + Height_RC | Sapling)
 observations: 1218
------
                             Median MAD_SD
(Intercept)                   0.115  0.035
Height_RC                     0.400  0.031
Height_RC:PatchBurned Forest -0.189  0.045
Height_RC:PatchForest         0.017  0.036
Height_RC:PatchTransition    -0.094  0.044

Error terms:
 Groups  Name      Std.Dev.
 Sapling Height_RC 0.1842  
Num. levels: Sapling 105 

With a hyper-SD of 0.18 around a hyper-mean as low as 0.21 in the Burned Forest (0.4 - 0.19), some of the slopes randomly generated from this hyperdistribution would certainly be negative, and thus the predictions (for sufficiently large Height_RC) would also be negative.

I was hopeful that it might be possible to extract the design matrix from posterior_linpred() by setting XZ = TRUE, which could then be used to construct the predictions by hand, manually censoring any negative values. Alas, even that throws the same error.

What's extra-annoying is that this error, which evidently comes from some function in lme4 that rstanarm is calling internally, appears to be related to the likelihood optimization routine that lme4 uses to fit models -- which of course is totally irrelevant for Bayesian inference. This isn't the first time I've run into "zombie errors" in rstanarm due to its reliance on the guts of lme4 for many procedures.

Will keep investigating. As a last resort, we could always generate the parameters for a "random tree" and construct the predictions by hand, again censoring any negative values. Wouldn't really be that difficult, though the convenience of the built-in functions is nice.

ebuhle commented 3 years ago

OK, I'm pretty sure this is right. Dark inner shading is the linear predictor, light outer shading is the posterior predictive distribution (95% credible intervals of each), and the line is the posterior median of the predicted values. Both the fitted values and the PPD are integrated over the tree-level random slope; i.e., these predictions are for an arbitrary unobserved tree.

It all looks good except for Bog Forest. We can gain some insight into what's going on by comparing the above plots to the posterior predictive vs. observed plots (ppc_scatter_avg_grouped()), which unlike the former are conditioned on the sampled values of the tree-level slopes. That is, these predictions are tuned to the specific saplings in the sample.

Clearly you can do a much better job of "predicting" the additional rings as a function of height in Bog Forest if you condition on the unique sample of saplings. The population-level slope in Bog Forest (the reference factor level) is basically the same as in Forest (the Height_RC:PatchForest term is negligible) yet the empirical relationships look quite different. It seems the model is accounting for the difference by estimating a disproportionate number of large positive random slopes in Bog Forest. Not totally clear why it doesn't estimate a steeper mean slope, but it's probably constrained by that concentration of points around 10 - 30 cm. Would be worth trying to fit Bog Forest separately, just to check if the slope estimate changes substantively.

All that said, whether or not this is a serious problem depends somewhat on whether we think the unusually heterogeneous growth patterns of harvested saplings also apply to older cored trees in Bog Forest, and on how this lack of model fit compares to the other quantifiable and unquantifiable sources of error in the coring-height correction.

Implementation-wise, it would be straightforward to use this same approach to generate out-of-sample predictions for the cored trees. I leave the exercise to the reader. :sunglasses:

ebuhle commented 3 years ago

Would be worth trying to fit Bog Forest separately, just to check if the slope estimate changes substantively.

OK, this is extremely weird. Here's what you get if you fit Bog Forest by itself (Height_RC:Patch term removed, obvs):

stan_glmer
 family:       poisson [identity]
 formula:      AddRings ~ Height_RC + (-1 + Height_RC | Sapling)
 observations: 171
------
            Median MAD_SD
(Intercept) 0.03   0.03  
Height_RC   0.15   0.07  

Error terms:
 Groups  Name      Std.Dev.
 Sapling Height_RC 0.584   
Num. levels: Sapling 24 

Compared to the estimate for Bog Forest in the full model, the slope is much lower (the opposite of what I expected based on the figure above). The among-tree heterogeneity is also much higher (which is what I expected, since the hyper-SD no longer has to compromise b/w Bog Forest and the low-heterogeneity patches). Even weirder, the fit to the data actually looks worse than before, albeit with a huge amount of noise coming from those tree-level random slopes (and as you'd imagine, a far higher percentage of negative predictions that have to be truncated -- 30% of the draws vs. 0.01% in the full model).

I also tried fitting the other three patches without Bog Forest. The estimates shift a bit, but nowhere near as dramatically.

Some further thoughts / Qs:

(1) I hadn't paid close attention to what you were doing w/ outliers, but checking this comment...

https://github.com/kzaret/RQ1v2_PIUVestab/blob/e0dc4760b53feebf3ed511e3633f3b977608341e/analysis/CorHeightAddRings.R#L32-L41

...against the data, if I'm translating the notation correctly, it looks like these observations are still included? We could try tossing them out (on the R side), although there's so much data that I'm skeptical this would make much of a difference.

Also, I realized that the point in Forest that I've flagged before, with nonzero additional rings at zero height, is still in there. This must be an error, yeah?

    Harv_Yr Site Order  Patch Plot Sapling      Round Height_RC RC Estab_Yr Priority AddRings Pot_outlier
239    2017  RMB     2 Forest  F03  F03_03 F0303_0to2         0 90     1926        1        9           0

(2) I realized there's another level of nesting / potential source of heterogeneity we've completely ignored, namely Plot. (You're supposed to catch these things! :stuck_out_tongue_winking_eye:) We should probably include plot-level random slopes as well.

(3) Looking at the data for individual trees in Bog Forest has me revisiting the spatial autocorrelation issue that I raised in the OP. Since additional rings are necessarily cumulative, a few large values at low section heights will put a tree on a highly anomalous trajectory, even if subsequent section-to-section increments are more typical. It occurs to me that what we really ought to be fitting is not the total additional rings relative to ground level, but the change in ring count from one section to the next, as a function of the difference in height. We could then reconstruct a prediction of total AddRings by cumsum-ing the increments. Though slightly more ungainly than our current approach, this would offer several advantages:

Thoughts?

ebuhle commented 3 years ago

Yeah, so I think fitting the incremental ring counts is the way to go. This allows us to write the linear relationship between section height and ring count on the log link scale as an intercept with an additive offset:

log(E[DiffRings]) = b0 + log(DiffHeight)

(I just now realized we could have done exactly the same thing with the model for AddRings and saved all that hassle with the identity link. Feeling extremely remedial. :face_with_head_bandage: Still, this is the way to go for all the reasons mentioned above.)

These models include a plot-level random intercept as well as sapling-level. I also experimented with using a negative binomial likelihood instead of Poisson to better capture the overdispersion in Bog Forest, and to a lesser extent Forest. I compared the two models using LOO, a fully Bayesian information criterion that (like all ICs) estimates out-of-sample predictive performance by approximating the in-sample predictive performance under leave-one-out cross-validation, penalized by model complexity. LOO does this by importance resampling from the posterior draws of the pointwise likelihood itself. These draws can be extremely skewed if the model fits a given data point poorly in most of the posterior space (bulk of low likelihoods) but nails it at a handful of sampled parameter values (long tail of high likelihoods). This situation is itself an outlier diagnostic, but also presents a computational problem for importance resampling, the severity of which is indicated by a smoothing parameter called the Pareto k.

The Poisson model has a handful of "bad" and "very bad" Pareto ks, unsurprisingly almost all from points in Bog Forest. These all but disappear in the NB model, itself an indicator of improved fit. The LOO pairwise comparison bears this out -- grain of salt given the unreliable estimates, but the difference of nearly 5 SD is decisive. (Notice that LOO comes with the thing you always wanted from AIC: an estimate of uncertainty.)

> loo_pois

Computed from 6000 by 1113 log-likelihood matrix

         Estimate    SE
elpd_loo  -2700.5  74.2
p_loo       180.3  19.6
looic      5400.9 148.4
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     1089  97.8%   489       
 (0.5, 0.7]   (ok)         11   1.0%   65        
   (0.7, 1]   (bad)        10   0.9%   12        
   (1, Inf)   (very bad)    3   0.3%   4         
See help('pareto-k-diagnostic') for details.

> loo_nb

Computed from 6000 by 1113 log-likelihood matrix

         Estimate   SE
elpd_loo  -2459.2 32.8
p_loo        40.2  3.7
looic      4918.4 65.5
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     1110  99.7%   2463      
 (0.5, 0.7]   (ok)          2   0.2%   294       
   (0.7, 1]   (bad)         1   0.1%   106       
   (1, Inf)   (very bad)    0   0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

> loo_compare(loo_pois, loo_nb)
                 elpd_diff se_diff
harv_glmer2_nb      0.0       0.0 
harv_glmer2_pois -241.3      47.2 

The NB captures the overdispersion or leptokurtosis of the marginal DiffRings distribution better than the Poisson, albeit at the cost of allowing some unrealistically large inter-section ring counts:

The catch is that, as you can see by exploring the posterior predictive checking (PPC) plots for dispersion and proportion of zeros, while Forest and especially Bog Forest are overdispersed, Burned Forest and Transition are underdispersed relative to the NB, and are better fit by the Poisson. (And even the NB doesn't fully capture the overdispersion in Bog Forest.) These aren't gross errors, though, and all in all the fits look quite good. But this is a case where we might weight scientific judgment as heavily as formal model selection in deciding which distribution to use.

kzaret commented 3 years ago

Thanks for all of the work that you've done to date and everything that I've learned (or at least been exposed to) from running through your code (e.g., 'here', shinystan, posterior predictive checks and loo).

Regarding the use of the NB vs. Poisson likelihood in the V2 models: Though formal model selection indicates that we should use the negative binomial likelihood, you’re suggesting that we could nonetheless use the Poisson-based model if we think that there’s a reason to do so based on the ecology of the system (e.g., if there was a reason why the observed data should be overdispersed relative to the model predictions for the Bog Forest, Forest and Trans [?] patches), right? What comes to mind is that I would expect a higher proportion of zeros (no difference in ring count from a lower to higher cross-section on a tree) in the Bog Forest patch relative to the other patches. I resampled counts of tree rings in specific cross-sections from that patch given counts that were equivalent in multiple cross-sections (or higher in cross-sections located further up the stem). The trees in the Bog Forest patch are tightly packed together in an environment subject to periodic inundation and on a substrate whose depth changes over time, thus creating physical pressure/stress on stems, which may result in asymmetric or incomplete ring growth or missing rings. I would expect and indeed observed some of this to be happening in the other patches (especially the Forest patch, if I’m remembering correctly), but not to the same extent as in the Bog Forest patch. Both the Poisson and NB models predict greater proportions of zeros (ranges of values) for the Burned Forest and Transition patches than the Bog Forest and Forest patches. The predicted proportion of zeros is more similar across patches for the V1 model, though the data for the Bog Forest and Forest patches is overdispersed. What do you think is going on here?

Except for the proportion of zeros test statistic, the PPC plots based on the ND model indicate underdispersion across patches. Gelman et al. (2020) write that this rarely happens with count data. What are the implications for making predictions with the model given underdispersion?

I read the following in Gelman et al. (2020) regarding the use of the offset in the Poisson and NB models: “Putting the logarithm of the exposure into the model as an offset. . . is equivalent to including it as a regression predictor, but with its coefficient fixed to the value 1. . . . Another option is to include it as a predictor and let its coefficient be estimated from the data. in some settings, this makes sense in that it can allow the data to be fit better; in other settings, it is simpler just to keep it as an offset so that the estimated rate θ has a more direct interpretation” (p. 268). In our case, the coefficient of log(DiffHeight) is fixed at 1, so this is a conceptually different “role” for the height at which ring counts were sampled as compared to the first model of AddRings that used the identity link and in which we were estimating the coefficient of Height_RC?

Finally, despite the time I’ve spent with Gelman et al.’s BDA over the past few days, I’m still running into a conceptual block, which comes into play in trying to understand the code that produced the plots of cumulative additional rings vs. height, grouped by patch (e.g., starting at line 316). How does the expected/fitted linear predictor differ from the PPD? Don’t both represent distributions of expected model outcomes? The PPD is conditional on the observed data, but the expected linear predictors are not (given that, from the code, they represent outcomes for a random tree picked from a random plot)? Why is the credible interval of the fitted linear predictors narrower than that of the PPD? Feel free to direct me to additional reading material, or perhaps I need to try some video tutorials. . . .

ebuhle commented 3 years ago

Lots of good Qs here. I'll tackle these slightly out of order.

Both the Poisson and NB models predict greater proportions of zeros (ranges of values) for the Burned Forest and Transition patches than the Bog Forest and Forest patches. The predicted proportion of zeros is more similar across patches for the V1 model, though the data for the Bog Forest and Forest patches is overdispersed. What do you think is going on here?

Except for the proportion of zeros test statistic, the PPC plots based on the ND model indicate underdispersion across patches. Gelman et al. (2020) write that this rarely happens with count data. What are the implications for making predictions with the model given underdispersion?

I did some head-scratching over these odd patterns myself. Relative to the Poisson model -- which is canonically how "overdispersion" of count data is defined because it lacks a free dispersion parameter and the variance equals the mean, although one can talk about over- or underdispersion relative to any count model -- incremental ring counts in Bog Forest are clearly overdispersed; Forest is also overdispersed, though most clearly in an excess of zeros; Burned Forest is underdispersed, most clearly in a deficit of zeros; and Transition has a borderline excessive variance-mean ratio (canonical def. of overdispersion) but a borderline deficit of zeros (indicator of underdispersion).

Squinting at the posterior violin plots and data, or the regression plots of data and predictions, I can see where those patterns of model misfit in each patch are coming from. My guess is that the underdispersion in Burned Forest, and whatever is going on with Transition, are related to the group-level intercepts. The grouped QQ plots show, and boxplots would show more clearly, that the variance of both the sapling- and (to a lesser extent) plot-level "random effects" is highest in Bog Forest, then Forest, then the other two. You gave a pretty convincing and intriguing biological explanation for the within-sapling overdispersion in Bog Forest and to a lesser extent Forest, which is exactly what we see. I imagine those stressful disturbance regimes also increase the spatial and among-sapling heterogeneity in growth rate and form.

That said, the Poisson misses Bog Forest and Forest way more egregiously than Burned Forest or Transition, which explains why LOO favors the negative binomial overall: the NB with its free dispersion parameter is better able to fit Burned Forest, and nails Forest. However, the extra heterogeneity exacerbates the underdispersion (relative to the model, as always) in the other two patches. Ideally you'd want to let the dispersion parameter (and maybe the random effect SDs) vary by patch. You can do the former, at least, in brms and not in rstanarm, but I'm not sure it's worth the time and effort. Alternatively you could caveman it and fit separate models by patch, but then you've got 3X as many models to do model selection, diagnosis, and prediction on.

Though formal model selection indicates that we should use the negative binomial likelihood, you’re suggesting that we could nonetheless use the Poisson-based model if we think that there’s a reason to do so based on the ecology of the system (e.g., if there was a reason why the observed data should be overdispersed relative to the model predictions for the Bog Forest, Forest and Trans [?] patches), right?

Sort of. I'm thinking more about how we're going to use these models, and the impact of different kinds of biases. The Poisson underestimates heterogeneity in Bog Forest and Forest, so it won't smear out the posterior on the coring height correction (and thus the prior on total tree age) as much as it should. OTOH, the NB admits a small but nonzero probability of implausibly large ring count increments regardless of patch (compare the rootograms), which might imply a small prior probability that a tree is 50-100 yrs older than it appears. Pick your poison, but I wouldn't agonize over it too much. In the grand scheme of ageing errors, this will not dominate.

I read the following in Gelman et al. (2020) regarding the use of the offset in the Poisson and NB models: [. . .] In our case, the coefficient of log(DiffHeight) is fixed at 1, so this is a conceptually different “role” for the height at which ring counts were sampled as compared to the first model of AddRings that used the identity link and in which we were estimating the coefficient of Height_RC?

OK, this one is my fault. Section height actually plays the same role in both models, but the slope on the linear scale becomes the intercept on the log scale. We hypothesize a linear proportionality (intercept is zero by definition) between height -- whether measured from root-shoot boundary or previous section -- and number of additional rings. This matches the intuitive interpretation of the Poisson (or NB) mean mu as a "rate" that scales with "extent" or "exposure" x, such that mu = b*x If you look under a board twice as large, all else equal, you expect to find twice as many salamanders, but they'll still follow the same underlying distribution.*

Because I am an idiot, I did not immediately recognize that this is equivalent to log(mu) = log(b) + log(x), where log(x) is an additive offset. This is obviously much more tractable to fit, and would have saved all the teeth-gnashing over the identity link. V1 also fits the data better with the log link, but is still conceptually flawed due to the mathematically nonindependent observations.

Finally, despite the time I’ve spent with Gelman et al.’s BDA over the past few days, I’m still running into a conceptual block, which comes into play in trying to understand the code that produced the plots of cumulative additional rings vs. height, grouped by patch (e.g., starting at line 316). How does the expected/fitted linear predictor differ from the PPD? Don’t both represent distributions of expected model outcomes? The PPD is conditional on the observed data, but the expected linear predictors are not (given that, from the code, they represent outcomes for a random tree picked from a random plot)? Why is the credible interval of the fitted linear predictors narrower than that of the PPD? Feel free to direct me to additional reading material, or perhaps I need to try some video tutorials. . . .

Posterior of linear predictor : posterior predictive distribution :: confidence interval : prediction interval in classical regression (a distinction I still remember learning in Albyn's class, and implementing in S+). The (inverse-link transformed) linear predictor is the expected value of the data, e.g. the predicted mean of a Poisson given the covariates. The PPD is the posterior distribution of new data generated from the Poisson distribution with that mean. Both quantities are estimated from (conditioned on, in the Bayesian sense) the observed data used to fit the model, and both can be used to "predict" those same observations or outcomes for previously unobserved inputs -- e.g., a grid of covariate values or a new group drawn from the hyperdistribution. The PPD will always be more variable than the (inverse-link transformed) linear predictor for the same reason a frequentist regression prediction interval is always wider than the CI of the fit: it incorporates the data-generating process. Which is also why the PPD is what you want for posterior predictive checking vs. the real data.**

The only real "trick" I pulled in that plot was using the model fitted to inter-section[ality***] ring counts to predict the cumulative counts that we tried to model directly in our naive initial approach. In practice, the predictions from v1 and v2 are v similar.

Sounds like you've already got some solid reading materials, and kudos for tackling the Bayesian Bible aka the Big Red Book. Happy I could help (or at least expose you to help).

Incidentally, this is directly related to the Poisson additivity property that I used to formulate the conditional multinomial / marginal Poisson state-space model. The Stan cognoscenti are rather snobbish about the supremacy of the PPD; see ?posterior_linpred for a taste. sorry

kzaret commented 3 years ago

In theory, I should be able to use one of the v2 GLMMS (I'm leaning toward the negative binomial) and posterior_predict (?) to make a joyplot per patch whereby x is the height of tree core extraction and y is each cored tree, right? To use v2_nb to predict ring counts given my "new data", can coring height simply be substituted for DiffHeight? If not, do I somehow need to make use of pre_epred (line 325)? Also, would I set Sapling = 0 but still use my "new data" Plot and Patch? (I wish that I had taken you up on your offer to 'drive' this step.)

Are we still thinking to create a submodel (or submodels -- different models per patch :/) that captures uncertainty in coring height measurements? Is so, then the results from that/those submodels would feed into this submodel, yes?

ebuhle commented 3 years ago

In theory, I should be able to use one of the v2 GLMMS (I'm leaning toward the negative binomial)

FWIW, I came across one weird trick to coerce an lme4-alike formula to specify different random-effect (tree and plot) variances per level of a fixed factor (patch). Haven't tried it yet, and again I'm not sure if it's worth fussing over given all the other known and unknown unknowns.

and posterior_predict (?) to make a joyplot per patch whereby x is the height of tree core extraction and y is each cored tree, right? To use v2_nb to predict ring counts given my "new data", can coring height simply be substituted for DiffHeight? If not, do I somehow need to make use of pre_epred (line 325)? Also, would I set Sapling = 0 but still use my "new data" Plot and Patch?

Close. As we did with the Duncan corrections, we want to use posterior_linpred(transform = TRUE) or, synonymously, posterior_epred(), rather than posterior_predict(). (Inverse-link transform necessary in this case b/c of the log link.) We're after the posterior distribution, given the section data, of the expected number of additional rings for a previously unobserved tree cored at a given height, rather than the distribution of hypothetical new data that would be observed from that tree. Since the cored trees are assumed to be randomly sampled from the same population as the sectioned saplings, we'll want to use a "new" level for Sapling (now somewhat of a misnomer). But since the cored trees come from the same plots as the sectioned saplings, you are correct that we want to condition on the specific Plot (and of course the fixed Patch). Well, except that cores_nd includes some plots that are not in harv, as well as one missing plot. Que?

setdiff(unique(cores_nd$Plot), unique(harv$Plot))
[1] "S07" "R07" "R08" "R06" "R09" NA    "R0X"

(Side note: my OCD is regretting that I kept your naming conventions instead of standardizing notation b/w the coring-height script and the Duncan one. I may find-and-replace all the CamelCase in the former, if you don't mind.)

So, yes, it would look very similar to this

https://github.com/kzaret/RQ1v2_PIUVestab/blob/77b8df682c8b46fcccb91b6a7d5a3279f48e2bb0/analysis/CorHeightAddRings.R#L310-L311

or more specifically this:

add_rings_cores <- posterior_epred(harv_glmer2_nb, newdata = cores_nd, offset = log(cores_nd$Height_RC))

Finally, as for the joyplots (glad they were such a hit!), the axes would be the same as in the Duncan ones -- x is the estimated additional rings, given the coring height of each tree.

(I wish that I had taken you up on your offer to 'drive' this step.)

You still can! I'm just giving you the ol' "teach a woman to fish" runaround, but would be happy to make the cast and let you mend it.

Are we still thinking to create a submodel (or submodels -- different models per patch :/) that captures uncertainty in coring height measurements? Is so, then the results from that/those submodels would feed into this submodel, yes?

Good question; I was avoiding bringing this up for the sake of morale. My recollection of our conversation w/ Andres is that we decided to punt on coring-height uncertainty for now and come back to it later -- does that sound right, and have you changed your mind? If so, yes, that is indeed how it would work, and it would make the prediction step a bit more involved because we'd have to (1) generate coring height from the prior, (2) generate expected additional rings from the model posterior, given coring height, and (3) aggregate predictions across coring-height draws for each tree. Not significantly more difficult, though; the heavy lift would be coming up with the informative coring-height prior for each tree.

ebuhle commented 3 years ago

(Side note: my OCD is regretting that I kept your naming conventions instead of standardizing notation b/w the coring-height script and the Duncan one. I may find-and-replace all the CamelCase in the former, if you don't mind.)

OCD got the best of me, so the variable names in the previous comment have changed.

kzaret commented 3 years ago

But since the cored trees come from the same plots as the sectioned saplings, you are correct that we want to condition on the specific Plot (and of course the fixed Patch). Well, except that cores_nd includes some plots that are not in harv, as well as one missing plot. Que?

Crap. I think the R0X cores should be left out of the analysis. They were sampled specifically because they were the largest live trees in the portion of the LC site under investigation and I was curious about their ages/growth rates (like the trees in the Cushion patch).

We're after the posterior distribution, given the section data, of the expected number of additional rings for a previously unobserved tree cored at a given height, rather than the distribution of hypothetical new data that would be observed from that tree.

Ok. So, use posterior_predict for new values of x for the samples/observations used to derive the posterior distribution of the model, but use posterior_linpred or posterior_epred to estimate values of y given values of x from new samples/observations.

add_rings_cores <- posterior_epred(harv_glmer2_nb, newdata = cores_nd, offset = log(cores_nd$Height_RC))

So, Height_RC can be thought of as equivalent to DiffHeight. That is, if a tree was cored 30 cm above the root-shoot boundary, we're imagining that 30 cm portion of the trunk is like the distance between consecutive observed sections (one at 0 one at 30)? What about what you write below?

It occurs to me that what we really ought to be fitting is not the total additional rings relative to ground level, but the change in ring count from one section to the next, as a function of the difference in height. We could then reconstruct a prediction of total AddRings by cumsum-ing the increments.

Finally, and sorry to come back around to this only now. The effort to write provokes deeper thought provokes effort to write. . . . It seems to me that the v1 model may be more appropriate because the location/height on the trunk where we're counting rings does matter, not just the length/amount of the trunk across which rings are being counted. The difference in the number of rings observed at 0 vs. 30 cm from the root-shoot boundary (DiffHeight of 30 cm) and the difference in the number of rings observed at 80 vs. 110 cm from the root-shoot boundary (DiffHeight of 30 cm) may not be the same because of changes in environmental conditions and thus changes in growth rates from the earliest to latest years of ring accumulation. Does this make sense, and am I accurately capturing what the v2 models are doing?

ebuhle commented 3 years ago

I think the R0X cores should be left out of the analysis.

Just so I'm clear, you mean "R0X" only, or all the ones that start with "R0"? And what about "S07" and the missing plot?

So, use posterior_predict for new values of x for the samples/observations used to derive the posterior distribution of the model, but use posterior_linpred or posterior_epred to estimate values of y given values of x from new samples/observations.

Go fish, sorry. :fishing_pole_and_fish: There are two distinct concepts: (1) the values of any predictors and/or grouping factors used to construct the predictions, and (2) what quantity is being predicted (i.e., the expected value of the observations vs. hypothetical observations themselves). These are orthogonal -- you could generate either fitted values (posterior_linpred() or posterior_epred()) or hypothetical new data (posterior_predict()), for either the same conditioning "inputs" used to fit the model or arbitrary new conditions.

In practice, the posterior predictive distribution is often most useful when applied to the originally fitted predictors and groups, because it allows a direct comparison with the actual observations (a posterior predictive check) which is a great tool for model validation and error-checking. But we might also use the PPD for, say, simulation-testing a model by fitting it to pseudo-data generated under a range of hypothetical conditions (varying sample size, contrast in predictors, etc.), a Bayesian version of "power analysis". Our regression plots show both the linear predictor and the PPD for an evenly spaced sequence of not-actually-observed coring heights.

Does that help? Was the analogy to classical regression confidence vs. prediction intervals clarifying at all?

It seems to me that the v1 model may be more appropriate because the location/height on the trunk where we're counting rings does matter, not just the length/amount of the trunk across which rings are being counted. The difference in the number of rings observed at 0 vs. 30 cm from the root-shoot boundary (DiffHeight of 30 cm) and the difference in the number of rings observed at 80 vs. 110 cm from the root-shoot boundary (DiffHeight of 30 cm) may not be the same because of changes in environmental conditions and thus changes in growth rates from the earliest to latest years of ring accumulation. Does this make sense, and am I accurately capturing what the v2 models are doing?

I think I've also successfully obfuscated this issue. :stuck_out_tongue: The two versions are the same underlying model (literally, the same formula), namely a linear proportionality relating the expected difference in ring counts mu to the difference in height dx: mu = b*dx. This functional form is based on the graphical linear growth model shown in your slides, as we talked about w/ Andres. Under this model, it doesn't matter whether the x-origin is the root-shoot boundary or somewhere else; the expected change in ring count between two heights x0 < x1 scales linearly with dx = x1 - x0. That's why it's possible to use a model fitted to dx = x1 - x0 to predict mu given dx = x1 - 0. I muddied the waters with my dumb comment about needing to use cumsum, because I hadn't yet noticed that this is just a standard log-Poisson model with an "exposure" offset: log(mu) = log(b) + log(dx).

The only difference between v1 and v2 is that v1 is fitted to the cumulative change in ring count starting at the RSB, i.e., the cumsum of the response variable in v2. This mathematical nonindependence of observations means that v1 is statistically fatally flawed -- not kosher, do not pass go. Going back to my @kzaret centric analogy, it would be like fitting a model to salamander counts under different-sized boards, with area as an offset, except the smaller boards are actually portions of the bigger ones so the counts are nested. In classical parlance, the nonindependence inflates the degrees of freedom, giving us a false impression of having more information than we do. (For example, we can predict without even seeing the data that the cumulative count will always increase monotonically with height.) IIRC, you can see this by comparing some of the PPC plots: v1 seems to "fit" a little better than v2. That's an illusion born of violating the basic assumption of independence.

The options for dealing with this are (1) add autocorrelated error terms to v1, which would require brms instead of rstanarm as well as a substantive change in model structure (we'd need observation-level random effects, b/c there are no residuals as such in a basic count likelihood) or (2) first-difference the data. This comes up a lot in time-series analysis, and first-differencing to remove first-order autocorrelation is a standard move. Also, because this is a case of mathematical and not merely empirical nonindependence, (1) wouldn't even truly address the problem. In (2), the fluctuating environmental conditions and growth rates are expressed as heterogeneity among the diff_rings observations (after accounting for diff_height), which the model has to fit. It's not ignored.

Clear as peat?

ebuhle commented 3 years ago

Actually, though, you raise an interesting point: first-differencing fixes the mathematical serial dependence, but there could still be empirical autocorrelation in the first differences driven by low-frequency environmental fluctuations in growth rate. So maybe we should try it in brms with AR(1) observation-level random effects?!?

Great, thanks a lot @kzaret.

kzaret commented 3 years ago

Just so I'm clear, you mean "R0X" only, or all the ones that start with "R0"? And what about "S07" and the missing plot?

I would like to leave out the "R0X" (not the other "R0" cores) and the RLARGE01 and 02, but I would like to include "S07". How would you go about predicting add. rings for the S07 core given that its plot was not included in the model?

Does that help? Was the analogy to classical regression confidence vs. prediction intervals clarifying at all?

Yes! Thank you for your patience with the conceptual material. (I hope not to be adding significantly to your daily level of exasperation.)

(For example, we can predict without even seeing the data that the cumulative count will always increase monotonically with height.) IIRC, you can see this by comparing some of the PPC plots: v1 seems to "fit" a little better than v2. That's an illusion born of violating the basic assumption of independence.

These points helped solidify mathematical vs. empirical independence for me. And thanks for connecting first-differencing to working with time series more generally.

Actually, though, you raise an interesting point: first-differencing fixes the mathematical serial dependence, but there could still be empirical autocorrelation in the first differences driven by low-frequency environmental fluctuations in growth rate. So maybe we should try it in brms with AR(1) observation-level random effects?!? Great, thanks a lot @kzaret.

Part of me thinks, Well, if we're going to problematize a component of dendroecological studies that's only usually afforded a sentence in published papers, if not relegated to the sup info, then we may as well go whole hog, right? I'm certainly curious about how the coring height corrections from my Gaussian linear regressions would compare to those predicted from the v2_nb model and a model that includes observation-level random effects. But, the other part acknowledges that there are still the issues of 1) coring height uncertainty, and 2) missing outter rings to deal with; the AAG talk is about four weeks away; and you are doing and would need to continue doing the heavy lifting -- sorry. Maybe dealing with brms isn't worth it right now?

ebuhle commented 3 years ago

I would like to leave out just R0X (not the other "R0" cores), but I would like to include "S07". How would you deal with the missing plot?

Do you actually not know which plot that core came from? And are there no sectioned saplings from the other "R0" and "S07" plots?

If the answer to any of these is yes, the solution is the same: treat them as random draws from the plot hyperdistribution, which will happen automagically since they do not appear among the plot levels used to fit the model. There is one small snag, which I discovered a couple years ago and spent way too long trying in vain to resolve. Apparently if newdata contains multiple "new" levels of a grouping factor, the posterior_[x]() functions will use the same draws from the hyperdistribution for all of them. This behavior is not ideal, and it's not that hard to work around by manually generating those values and constructing the predictions in R, but in this case I can't see that it really matters too much.

Part of me thinks, Well, if we're going to problematize a component of dendroecological studies that's only usually afforded a sentence in published papers, if not relegated to the sup info, then we may as well go whole hog, right? I'm certainly curious about how the coring height corrections from my Gaussian linear regressions would compare to those predicted from the v2_nb model and a model that includes observation-level random effects.

Note that your linear regressions are fatally flawed for the same reason as GLMM v1 (in addition to using the wrong likelihood for the data and completely pooling across trees).

Also, keep in mind that the negative binomial distribution and observation-level random effects are both ways of expanding the Poisson to include pointwise error or heterogeneity. To wit, the NB is a continuous mixture distribution -- a distribution formed by letting the parameter(s) of one distribution vary randomly according to another distribution, and then marginalizing out the varying parameter(s). Specifically, the NB can be derived by giving the Poisson mean a gamma distribution. You can even encode that hierarchical representation explicitly in Stan (et al.) by making each predicted value a gamma latent variable. The "lognormal-Poisson" distribution (which does not have a closed-form PMF but is what you get with normally distributed observation-level random effects on the log link scale) differs only in substituting the lognormal for the gamma. Since the gamma and lognormal are famously difficult to distinguish from data, let alone latent "data", the substantive results are typically very similar. (Computational performance is another matter.)

The only advantages I can think of for the lognormal-Poisson in this case are that it would allow (1) spatially autocorrelated within-sapling residuals and (2) different residual variances per patch, which might address the varying levels of heterogeneity / overdispersion we've wrestled with. But you'd need brms for (1), and if you're using brms then you could also let the NB dispersion parameter vary by patch, making (2) irrelevant. There is also the "one weird trick" I mentioned above, which in theory would, in an ungainly manner, allow (2) within rstanarm (and similarly for the tree- and plot-level variances per patch).

A relatively quick test to see if any of this is worth further consideration would be fitting the Poisson with IID observation-level residuals in rstanarm and checking the spatial ACF of the residuals.

But, the other part acknowledges that there are still the issues of 1) coring height uncertainty, and 2) missing outter rings to deal with; the AAG talk is about four weeks away; and you are doing and would need to continue doing the heavy lifting -- sorry. Maybe dealing with brms isn't worth it right now?

I agree. This is a canonical example of something I would set aside for now and revisit later for publication. Especially if we're going to similarly table the coring-height uncertainty (?), which is a much bigger deal. We don't even know yet if the state-space model is going to work, like, at all!

Thank you for your patience with the conceptual material. (I hope not to be adding significantly to your daily level of exasperation.)

De nada. Believe it or not, this is...fun (which tells you something about my baseline level of daily exasperation).

ebuhle commented 3 years ago

Nice avi.

ebuhle commented 3 years ago

OK, I went ahead and put this one out of its misery. (Sorry, the suspense was killing me.) I used the negative binomial model and excluded "R0X" and missing plots. [EDIT: I just figured out what you meant by your edit above. "RLARGE01" AND "RLARGE02" are trees, and they are the ones with NA for plot.]

The joyplots aren't quite as expressive as the boring interval plots in this case, but here they are. The "blocks" are of course because many trees have the same coring height.

BTW, I/we never did resolve what to do with the "outliers" enumerated in your original comments on harv, or those flagged as pot_outlier == 1 (are these one and the same? I haven't checked). Currently they are all included; the only sections excluded are three with negative diff_rings. Of course, I/we could easily change this.

Lucky for you, you'll get to do most of the driving on the next step: informative priors for coring-height uncertainty. Unless we punt on that.

kzaret commented 3 years ago

BTW, I/we never did resolve what to do with the "outliers" enumerated in your original comments on harv, or those flagged as pot_outlier == 1 (are these one and the same? I haven't checked). Currently they are all included; the only sections excluded are three with negative diff_rings. Of course, I/we could easily change this.

The "outliers" were categorized as such during my process of trying to model additional rings by height along stem using Gaussian linear regressions per patch. Removing the outliers increased the R-squared values. I think we should continue to include them in the current modeling efforts (and perhaps they should have been kept during those earlier efforts, if that approach had been valid to begin with).

Lucky for you, you'll get to do most of the driving on the next step: informative priors for coring-height uncertainty. Unless we punt on that.

Circling around to coring-height uncertainty: 1) For each patch, I have a qualitative sense (in some cases supported by quantitative data) of how "off" our measurements of coring height might have been in general (e.g., perhaps up to 1.5 to 2 m in the Forest patch); but 2) individual trees varied in terms of the difficulty in measuring or estimating their coring heights (and I have notes about those individuals); so, the same degree of uncertainty shouldn't apply uniformly to trees within a patch. How can we make use of the above?

ebuhle commented 3 years ago
  1. For each patch, I have a qualitative sense (in some cases supported by quantitative data) of how "off" our measurements of coring height might have been in general (e.g., perhaps up to 1.5 to 2 m in the Forest patch); but 2) individual trees varied in terms of the difficulty in measuring or estimating their coring heights (and I have notes about those individuals); so, the same degree of uncertainty shouldn't apply uniformly to trees within a patch. How can we make use of the above?

Great, we just need to turn that quantitative and qualitative information into an error distribution for each tree. That would involve:

  1. Choose a parametric family. I'd imagine height measurement error is roughly symmetric, so the normal would be a convenient choice unless some of the heights are sufficiently close to zero that a normal with the desired SD would place significant probability mass on negative values. In that case we could just zero-truncate the normal, or use a nonnegative distribution (e.g. lognormal) at the expense of symmetry and convenience.
  2. Assign parameters (e.g. normal SD) to each tree in PIUV_CoredProcessed.csv and enter your best guesses in a new column. For example, we could interpret "perhaps up to 1.5 m" as meaning "95% of the probability mass within ± 1.5 m of the measurement", so sigma <- 1.5 / qnorm(0.975). (This calculation could be automated, if you entered height_error_U95 for each tree.)
  3. Simulate values from the coring-height prior for each tree and use them as newdata to generate predictions from the GLMM. This would be even simpler to implement than I made it sound above.
  4. Profit!
kzaret commented 3 years ago

Assign parameters (e.g. normal SD) to each tree in PIUV_CoredProcessed.csv and enter your best guesses in a new column. For example, we could interpret "perhaps up to 1.5 m" as meaning "95% of the probability mass within ± 1.5 m of the measurement", so sigma <- 1.5 / qnorm(0.975). (This calculation could be automated, if you entered height_error_U95 for each tree.)

Sorry, I need a little more hand-holding or dot connecting: if I think that a tree's estimated coring height is only off by ± 0.10 m, then sigma <- 0.10/qnorm(0.975), ja? I'm not sure I get the bit about automating the calculation.

ebuhle commented 3 years ago

Sorry, I need a little more hand-holding or dot connecting: if I think that a tree's estimated coring height is only off by ± 0.10 m, then sigma <- 0.10/qnorm(0.975), ja? I'm not sure I get the bit about automating the calculation.

Si, if you interpret "only off by" as meaning the 95% interval of the observation error distribution. You could choose a different quantile if you like. By automation, I just meant that it might be simpler and more intuitive to enter the upper quantile as data and then vectorize the calculation sigma <- height_error_U95 / qnorm(0.975) in R.

(This is a good, minimal example of what the subfield of "prior elicitation" is all about.)

kzaret commented 3 years ago

I added a height_error_u95_cm field. I decided to be generous with my expectations of the accuracy of our estimates of coring height.

ebuhle commented 3 years ago

Did you push this? I don't see it.

kzaret commented 3 years ago

Right. Pushed it.

ebuhle commented 3 years ago

Sorry, I still don't see it. Sometimes GH is laggy in uploading commits, but this is excessive. Does it show up in the commit history for you? If so, link?

kzaret commented 3 years ago

It seems I missed an error earlier when I tried to push the commit. I just pulled then pushed and I see my commit.

ebuhle commented 3 years ago

That turned out to be even simpler than I'd thought, although not until I'd spent a few hours making it way more complicated (and computationally expensive) than it needed to be. The beauty is that you can just generate independent samples of height from the truncated normal measurement error prior, use posterior_epred(..., offset = 0) to generate posterior samples of added rings per cm, and then multiply by height to get the unconditional posterior of added rings, marginalizing over measurement error.

Of note, there are only three trees where > 5% of the measurement error prior mass is truncated by the zero lower bound:

filter(cores, height < height_SD*2)

       patch plot year   tree height status outer_rings orw remove min_age ring_count height_error_u95 height_SD
1     Forest  F01 2017 F01T02     57   Live           1  NA      0       0        323              100  51.02135
2 Transition  R05 2018 R05T02     30   Live           1  NA      0       0         65               50  25.51067
3 Transition  S05 2018 S05T01     37   Live           1  NA      0       0         57               50  25.51067

The predictions clearly reflect that you were much more uncertain about some height measurements than others:

<img src=https://user-images.githubusercontent.com/22503560/113185753-b2621500-920b-11eb-9ec6-20e4fcfb9fac.png width="70%" height="70%">

kzaret commented 3 years ago

Awesome! I really like getting to see/show the variation in uncertainty. Sorry for the slow replies these days -- pre-talk anxiety and field season prep chaos.

ebuhle commented 3 years ago

I went ahead and wrote out the plots of total tree age and annual recruitment, albeit still incomplete -- I'm guessing we're not going to resolve #6 before AAG.

Graphical presentation of the prior on recruitment needs some work, whether before AAG or later. It's tricky b/c the number of trees established in a given year is a small integer, so the 95% credible interval always includes zero and often tops out at 1, quantiles (median and credible interval) are step-like, and the median itself is almost always zero. For that reason, the plot below shows the posterior mean instead of the median, but it's still not entirely satisfying. I've also wrestled with what to call this quantity: it's not really recruitment density (trees/ha) because a fixed number were sampled per plot. Maybe instead of integers it should be probability of recruitment in a given year (summing to 1 over each time series), which corresponds to the multinomial formulation of the Poisson state-space model.

On a substantive note, there seems to be remarkably little synchrony across patches, which does not necessarily bode well for modeling. A lot of that asynchrony comes from the dynamics (or lack thereof) pre-1700, again raising the question about mortality (#3).

kzaret commented 3 years ago

I went ahead and wrote out the plots of total tree age and annual recruitment, albeit still incomplete -- I'm guessing we're not going to resolve #6 before AAG.

Nope, sorry. I will start cross-dating after AAG.

I've also wrestled with what to call this quantity: it's not really recruitment density (trees/ha) because a fixed number were sampled per plot. Maybe instead of integers it should be probability of recruitment in a given year (summing to 1 over each time series). . . .

There's a CountperHa column in the PIUV_CoredProcessed.csv that relates an individual tree to the density of trees it could, in theory, represent at the hectare scale (based on the "search area" of each tree -- sometimes we needed to go outside our 10 x 10 m plots to find individuals to core). I think that plots of integers may be easier to interpret than that of probabilities -- what do you think?

ebuhle commented 3 years ago

There's a CountperHa column in the PIUV_CoredProcessed.csv that relates an individual tree to the density of trees it could, in theory, represent at the hectare scale (based on the "search area" of each tree -- sometimes we needed to go outside our 10 x 10 m plots to find individuals to core). I think that plots of integers may be easier to interpret than that of probabilities -- what do you think?

So CountperHa is the total number of trees in each (possibly expanded) plot? Why are the values mostly so "round" and so repetitive across plots, while at the same time sometimes varying within plots?

I agree cell probabilities would be hard to interpret, because the natural inclination would be to read them as independent annual probabilities rather than a distribution of establishment year (which is really the information we have, given a fixed sample size). Instead of explaining all that, you could just say that recruitment is based on the sample of cored trees. Maybe the sample sizes should be on the plots?

kzaret commented 3 years ago

So CountperHa is the total number of trees in each (possibly expanded) plot? Why are the values mostly so "round" and so repetitive across plots, while at the same time sometimes varying within plots?

This seems to be another common dendro maneuver: thinking of each cored tree as representative of the search area in which it was found and then scaling up to the hectare in order to frame the results in terms of density rather than individual cored trees. So a single tree in my 10 x 10 m plots could be thought of as representing 100 trees in 1 hectare. If I had to leave the plot to find a tree to core, my extrapolation to hectare was adjusted to account for the expanded search area. I won't try to talk you into using density.

I agree cell probabilities would be hard to interpret, because the natural inclination would be to read them as independent annual probabilities rather than a distribution of establishment year (which is really the information we have, given a fixed sample size). Instead of explaining all that, you could just say that recruitment is based on the sample of cored trees.

Hmm. I guess either way, I'm going to have to do some explaining bc people will be confused by recruitment values of less than 1. . . .

Maybe the sample sizes should be on the plots?

Yes! I had casually thought of displaying them in a chart on one of the fieldwork methods slides, but it would be much better (more efficient use of space and more relevant) to have them with the data; oh my, this makes me feel really out of it.

On a substantive note, there seems to be remarkably little synchrony across patches, which does not necessarily bode well for modeling.

Hrmph. In trying to interpret what's going on in the recruitment by year plots, I think it would be helpful to have some grid lines. I don't trust myself to be able to draw an imaginary line from 1800 up to the correct place along the y axis in the Transition plot. I can make this change at some point, but if you wind up editing code before I get there, thank you. The recent simultaneous recruitment pulses in the Transition and Burned Forest patches are good to see, as is the evidence for some recruitment in the Transition patch during the pulse in the late 1700s in the Burned Forest. It looks like there are at least some overlapping pulses in the Bog Forest and Forest patches, as well.

ebuhle commented 3 years ago

This seems to be another common dendro maneuver: thinking of each cored tree as representative of the search area in which it was found and then scaling up to the hectare in order to frame the results in terms of density rather than individual cored trees. So a single tree in my 10 x 10 m plots could be thought of as representing 100 trees in 1 hectare. If I had to leave the plot to find a tree to core, my extrapolation to hectare was adjusted to account for the expanded search area.

Haha, it's like taking RTP's oft-repeated rhetorical question "What's the density of a single tree?" seriously! Maybe I'm misunderstanding the point, but it seems that if you wanted to expand a sample of N trees to be representative of the plot from which it was taken, you would need to know the total number of trees M in the plot. Then if n[t] sampled trees were born in year t, the recruitment density (per plot, easily convertible to per ha) would be n[t]*M/N. Or more simply said, you expand by dividing by the sampling rate, i.e. the probability that any given tree was sampled. No?

Yes! I had casually thought of displaying them in a chart on one of the fieldwork methods slides, but it would be much better (more efficient use of space and more relevant) to have them with the data; oh my, this makes me feel really out of it.

I regret having suggested this, after spending way too long trying unsuccessfully to figure out how to add nicely formatted sample sizes to the facet labels using the labeller argument. Will try again later, although it might be time to graduate from ggplot2 to grown-up graphics. Vertical gridlines, though, that I can do. (I took them out because the ggplot2 defaults are predictably terrible.)

I also was relieved to see the simultaneous recruitment pulses you mention. From the standpoint of a multivariate time series model, though, I fear the cross-correlations will be swamped by the differences earlier in the series, especially in Forest vs. the others. Oh well, we'll have plenty of time to find out!

kzaret commented 3 years ago

Or more simply said, you expand by dividing by the sampling rate, i.e. the probability that any given tree was sampled. No?

Yes, this seems like a sensible way to do it.

grown-up graphics

Oh dear. You know, it's quite easy to just add text to a image in PowerPoint. . . .

I fear the cross-correlations will be swamped by the differences earlier in the series.

Will we be able to examine fire and/or climate dynamics by patches individually? It could be that tree establishment in the Bog Forest patch, especially, has a different relationship with climate than the other patches.

Also, I think we began using 'tree establishment' instead of 'tree recruitment' given push back from Alex Fajardo on that wording for the pilot study. However, that had to do with field counts of seedlings, etc. but didn't include any work on seed abundance or availability; thus we weren't capturing all aspects of recruitment. With the dendro work, however, we're seeing the end product of all of those aspects -- the successful establishment and survival of individual trees.

ebuhle commented 3 years ago

Yes, this seems like a sensible way to do it.

So do we know the total number of trees per (possibly extended) plot?

Oh dear. You know, it's quite easy to just add text to a image in PowerPoint. . . .

It's easy the first time, maybe. Then you change your filter or get some new data and wish you'd written a program to do your programming. :imp: Anyway, this was a worthy foe to vanquish, a two-headed hydra of ggplot2 and my old "character-building" nemesis plotmath. You really do learn something new every single time.

You're welcome to tinker with the plots; for example, I had in mind the color scheme of the "observations" as well as the posterior (vs. the true states) in the vignette, but it's not very exciting.

Will we be able to examine fire and/or climate dynamics by patches individually? It could be that tree establishment in the Bog Forest patch, especially, has a different relationship with climate than the other patches.

Yes, and the variety of different possible multivariate and hierarchical structures for the states (and even the covariate effects) gets overwhelming pretty fast, especially when you consider the plot level. I'll start an issue for the state-space modeling after AAG. On the data side, though, I still have this nagging worry that older individuals are just better preserved in Forest. Clearly recruitment wasn't zero pre-1600 in the other patches. (Unless the patch itself established that recently? Seems unlikely.)

Also, I think we began using 'tree establishment' instead of 'tree recruitment' given push back from Alex Fajardo on that wording for the pilot study. However, that had to do with field counts of seedlings, etc. but didn't include any work on seed abundance or availability; thus we weren't capturing all aspects of recruitment. With the dendro work, however, we're seeing the end product of all of those aspects -- the successful establishment and survival of individual trees.

That was my rationale. I actually took a cursory look at the forest ecology literature (benthic marine ecology has similar controversies about "settlement" vs. "recruitment", so I saw this one coming) and "recruitment" seemed fairly common for the population-dynamic process. "Establishment" makes sense for the mechanism(s). That was at least implicitly how we partitioned recruitment into sub-processes (dispersal, germination, distance-dependent survival) in Rogers et al. (2017), although she never actually used "establishment".

My main objection to "recruitment" is that we're not measuring recruitment, because we're not accounting for mortality. It's really just the age distribution of living trees.

kzaret commented 3 years ago

So do we know the total number of trees per (possibly extended) plot?

We know the number of trees per plot, but only educated guesses as far as extended plots.

Anyway, this was a worthy foe to vanquish

I wish I was feeling that way about my tasks.

On the data side, though, I still have this nagging worry that older individuals are just better preserved in Forest. Clearly recruitment wasn't zero pre-1600 in the other patches. (Unless the patch itself established that recently? Seems unlikely.)

I could imagine that the Bog Forest and Transition patches established recently, while the Burned Forest patch has lost older individuals to repeat fires and logging (mostly of snags).

My main objection to "recruitment" is that we're not measuring recruitment, because we're not accounting for mortality. It's really just the age distribution of living trees.

Agreed. I think this is important to keep in mind. and articulate to some degree; 'tree establishment' is a tad closer to that.