kzaret / RQ2_Dendro_v2_PIUVestab

0 stars 0 forks source link

Developing informative priors on tree age: Duncan ring-to-pith corrections #4

Open ebuhle opened 3 years ago

ebuhle commented 3 years ago

As issue #2 has turned into a long, detailed thread on GLMMs for the coring-height correction, I've realized it would have been more helpful to make separate issues for each of the three main components of ageing error. Oh well. We've done most of the heavy lifting on coring height, so this is a good time to start considering the Duncan ring-to-pith corrections. I stand by my last proposal, although I haven't actually seen the ring width data yet:

Actually, maybe what we should do with the ring widths is fit a parametric distribution to them (a kernel; lognormal or gamma being the obvious candidates). The posterior sample from the mean of that kernel can then be directly algebraically transformed into a posterior sample of Duncan ring-to-pith corrections using basic MCMC facts.

I initially balked at this idea because a distribution fitted to N = 3 observations is going to be pretty uncertain. But then I realized, duh, we just make it a hierarchical model across all the trees in the pithless sample. They won't all have the same mean inner ring width, obviously, but the tree-level means will be drawn from some hyperdistribution that also includes a fixed effect of patch. It would be really simple to fit this model in rstanarm, and a few more lines of code to get what we're after.

Thoughts? Questions? Do you want to take a first crack at this, @kzaret, or should I?

kzaret commented 3 years ago

Ah, a nice clean issue. . . .

I would love for you to go ahead with this. Note that while PithlessTCs_InnerRingWidths.csv does include a patch variable, it does not include a plot variable. I would think that plot would have little effect on the randomly sampled inner three ring widths of a pithless tree core, while patch encompasses many of the processes affecting patterns in tree growth, including overarching trends in ring widths (and thus the range of ring width values likely to be encountered in those inner three rings per core).

I am sure that I will have questions.

ebuhle commented 3 years ago

So is Core_ID a proxy for tree -- that is, there is exactly one core per tree?

kzaret commented 3 years ago

Yes, Core_ID is a unique value: one core per tree.

ebuhle commented 3 years ago

Got a nice hierarchical model fit, everything looks great, all ready to compute the Duncan correction...and I realized we also need h and L. Duh.

Also, are your inner ring widths already corrected for tangency (Duncan 1989 Eq. 2, Fig. 2), and if not, should we do that?

kzaret commented 3 years ago

Duncan_estimates.csv contains h and L values per core. I don't think that we need to correct for tangency. When I measured the innermost ring widths, I was already used to delineating the start and end of a ring (and thus measuring the length between those points) by imagining or placing an actual line (using one of the tools in the software program Cdendro) that started at the imagined pith and ran perpendicular through the rings.

ebuhle commented 3 years ago

There are five core IDs in Duncan_estimates.csv that are not in PithlessTCs_InnerRingWidths.csv:

setdiff(unique(duncan$tree), unique(inner_rings$tree))
[1] "C02"     "C03"     "C04"     "C10"     "F04T05b" "F05T06a"

Should we ignore these trees? Or treat them as random draws from the hyperdistribution?

ebuhle commented 3 years ago

Well, that was pleasantly straightforward. You can see for yourself, but I fit a hierarchical linear model to log-transformed inner ring width (i.e., assuming a lognormal likelihood for the observations) with patch-specific population-level intercepts and random tree-level intercepts. Then I used the posterior draws of predicted median ring width for each tree exactly as Duncan uses the sample mean inner ring width, to calculate rings-to-pith. (I used the median rather than the mean b/c it's a more robust measure of central tendency for the lognormal data distribution, but we can experiment with this.) This gives us a sample from the posterior distribution of rings-to-pith, which becomes a term in the prior on total tree age. I left out those five trees with missing ring widths for now, pending further instructions.

The rings-to-pith estimates mostly make sense, although some of them have (again) very long tails, as you can see from the untransformed interval plots or by inspecting rings_to_pith. I'm not sure this is really a problem; it reflects the fact that the estimated median inner ring width is sometimes quite small. As you'd expect given the underlying lognormality, the largest estimates are the most uncertain, which is reasonable behavior.

On the log scale, though, they look beautiful. I've been waiting for The Eternal, much longer than Twenty Four Hours, if not quite Decades, for a good reason to make a joyplot (or whatever the squares are calling them now).

kzaret commented 3 years ago

I'm glad that you left out the cores from Duncan.csv that weren't in the PithlessTCs_InnerRingWidths csv. The first four were collected haphazardly in the Cushion patch where I did not sample full plots (I was just curious about relative tree ages and ring widths as compared to the other plots). The other two were removed because the innermost rings of the cores were so far from the pith or collected at an angle so unlikely to hit the pith that there was no inner ring curvature.

I ran through the code a couple of times -- the second time because I had forgotten to adjust the patch types from Burned Forest to Transition for a couple of the cores when I initially created the PithlessTCs_InnerRingWidths.csv. The model output was similar before and after that change (see the .pptx).

I think that I get what you did, barring a few details (but I can spend some more time with the help menu when I feel like there's more breathing room). I'm really into the joyplots!

ebuhle commented 3 years ago

The first four The other two

Thank you for gently informing me that I cannot count.

At some point we should probably take the plunge into documenting these analyses in RMarkdown instead of dumping results and text into PowerPoint. Reproducible research and all that. I could take a first crack at this to help you get up to speed on the formatting and workflow, if you like. Maybe we should create an issue devoted to that.

I think that I get what you did, barring a few details (but I can spend some more time with the help menu when I feel like there's more breathing room).

Feel free to ask here if you think that would be more efficient.

I'm really into the joyplots!

Somewhere Ian Curtis is looking down unsmilingly but approvingly.

kzaret commented 3 years ago

At some point we should probably take the plunge into documenting these analyses in RMarkdown instead of dumping results and text into PowerPoint. Reproducible research and all that. I could take a first crack at this to help you get up to speed on the formatting and workflow, if you like. Maybe we should create an issue devoted to that.

That would be great. I've been meaning to start using RMarkdown, but hadn't yet commited. I would be happy to have your lead to follow.

As a side note, during a lab meeting today, folks mentioned using Adobe Illustrator when making figures for manuscripts (in addition to R, PowerPoint, etc.). Have you found Illustrator to be a necessary tool?

ebuhle commented 3 years ago

That would be great. I've been meaning to start using RMarkdown, but hadn't yet commited. I would be happy to have your lead to follow.

OK, I'll put this on the board.

As a side note, during a lab meeting today, folks mentioned using Adobe Illustrator when making figures for manuscripts (in addition to R, PowerPoint, etc.). Have you found Illustrator to be a necessary tool?

Depends on the purpose, but if it's for tweaking figures produced in R to make them "just so", then I am IDEOLOGICALLY OPPOSED to this practice. It's a textbook example of non-reproducible workflow, defeating one of the main virtues of using R instead of some pointy-clicky stats program. (Sorry, StatView.) I have yet to encounter a situation where someone (inevitably a collaborator who thinks they're saving me work) suggests post-processing, where investing the time to make the figure "just so" in R didn't pay off in the long run.

If it's stuff like this, though, I'm all for it. You could do that in R, but...yeah, no.

ebuhle commented 3 years ago

I keep forgetting to ask about the subtract_if_not_arching column in duncan. Can you remind me what this is -- is it in Duncan (1989) somewhere? Do we need to include this in our final rings-to-pith estimates?

EDIT: Also, there is one tree in duncan that is not present in cores, after tossing out the known offenders ("R0XT01a", "RLARGE01", "RLARGE02") and those that are not present in inner_rings ("C02", "C03", "C04", "C10", "F04T05b", "F05T06a"):

setdiff(duncan$tree, cores$tree)
[1] "F07T01b"

EDIT: Oh wait, but now I see "F04T05b" and "F05T06a" are present in cores, so maybe the problem is that they're missing from inner_rings? Basically, I need some QA/QC help to make sure the input files and post-munging data sets include the right trees for our analysis workflow, so I'm not taking shots in the dark.

Another question, as the main objective of constructing informative priors on overall tree age is almost in sight: which column of PIUV_CoredProcessed.csv (assuming that's the right data set to use for this purpose) represents "known" rings, i.e., the fixed value to which the stochastic terms will be added? Is it TR_Count? More generally, is there a data dictionary for this file, or any other subtleties I should be aware of when constructing the priors?

names(cores_raw)

 [1] "Year_coll"              "Site"                   "Patch"                 
 [4] "Patch2"                 "Plot"                   "Individual"            
 [7] "Height_cm"              "Diam_1.3m_cm"           "Diam_0.3m_cm"          
[10] "AltDiam_cm"             "Height_AltDiam_cm"      "Diam_stump_cm"         
[13] "Cor_Height_cm"          "Cor_Diam_cm"            "Within_plot"           
[16] "CountperHa"             "Sex"                    "Habit"                 
[19] "Tree_Measure_Notes"     "Core_Notes"             "Sample_Type"           
[22] "Status"                 "Measure_Certainty"      "Pith"                  
[25] "Outer_rings"            "Within_plot.1"          "TR_Count"              
[28] "X"                      "CoringHeight_Corr"      "CoringHeight_Corr_prop"
[31] "Duncan_qual"            "Duncan_RTP_all"         "Duncan_RTP_all_prop"   
[34] "TreeAge_all"            "Duncan_RTP1"            "Duncan_RTP1_prop"      
[37] "TreeAge1"               "AgeAcc1"                "EstabDate1"            
[40] "EstabDate1_50"          "EstabDate1_10"          "Duncan_RTP2"           
[43] "Duncan_RTP2_prop"       "TreeAge2"               "AgeAcc2"               
[46] "EstabDate2"             "EstabDate2_50"          "EstabDate2_10"         
[49] "EstabDate2_20"          "DuncanRTP_3"            "Duncan_RTP3_prop"      
[52] "TreeAge3"               "AgeAcc3"                "EstabDate3"            
[55] "EstabDate3_50"          "EstabDate3_10"         
kzaret commented 3 years ago

I keep forgetting to ask about the subtract_if_not_arching column in duncan. Can you remind me what this is -- is it in Duncan (1989) somewhere? Do we need to include this in our final rings-to-pith estimates?

It's not in Duncan (1989); it may be something that Andres and I cooked up or maybe one of his colleagues who sent us the template spreadsheet we were working with, I don't actually remember. Let's exclude it from the final rings-to-pith estimates.

EDIT: Oh wait, but now I see "F04T05b" and "F05T06a" are present in cores, so maybe the problem is that they're missing from inner_rings? Basically, I need some QA/QC help to make sure the input files and post-munging data sets include the right trees for our analysis workflow, so I'm not taking shots in the dark.

Talk about propagating uncertainty. . . .F07T01b does not appear to actually exist (no scan, no physical core, no data in my master spreadsheet). It's been removed from the data in the repo. F04T05b and F05T06a should not have been in the Duncan spreadsheet (and perhaps should be removed from the analysis altogether): they apparently broke far enough from the pith that no arching rings are visible, so the Duncan method does not apply.

. . .which column of PIUV_CoredProcessed.csv (assuming that's the right data set to use for this purpose) represents "known" rings, i.e., the fixed value to which the stochastic terms will be added? Is it TR_Count? More generally, is there a data dictionary for this file, or any other subtleties I should be aware of when constructing the priors?

Yes, TR_Count represents the known rings. I'm sorry, there's no data dictionary. I had that on my to-do list for a while, but I was avoiding it and everything else related to data analysis, so I just let it go. All the columns that come after TR_Count are based on old methods and can be ignored for our purposes. I've added some new columns while doing QAQC related to missing outer rings, which I'll write more about in #6.

ebuhle commented 3 years ago

F04T05b and F05T06a should not have been in the Duncan spreadsheet (and perhaps should be removed from the analysis altogether): they apparently broke far enough from the pith that no arching rings are visible, so the Duncan method does not apply.

Ah, bummer. I don't suppose there's a Hail Mary for this situation? (And no, I don't have any clever suggestions either.)

Yes, TR_Count represents the known rings. I'm sorry, there's no data dictionary. I had that on my to-do list for a while, but I was avoiding it and everything else related to data analysis, so I just let it go.

No worries. I guessed wisely, as it turns out.

ebuhle commented 3 years ago

While working on the Duncan vignette (#5), I discovered that there is still (or once again) some lingering mismatch between the unique tree (or core) IDs in PithlessTCs_InnerRingWidths.csv vs. Duncan_estimates.csv. I thought we had resolved these issues by appropriate filtering, but comparing the raw imported data frames shows the following cores in duncan_raw that are not in inner_rings_raw:

setdiff(duncan_raw$Core.., inner_rings_raw$Core_ID)
[1] "C02"      "C03"      "C04"      "C10"      "BF02T05b" "BF04T04b" "F06T10c"  "F06T11a"

The first four are in Cushion and will be filtered out (although I see we weren't doing that in a forthright way), but why aren't the last four in the inner-ring width data?

setdiff(inner_rings_raw$Core_ID, duncan_raw$Core..)
[1] "BF02T05B" "BF04T04B" "F06T10C"  "F06T11A" 

Oh. They are, but the last letter is capitalized. Do you want me to fix this on the R side, or do you want to harmonize the names in the raw data? (The column names are another matter, but if it ain't broke...)

kzaret commented 3 years ago

Sorry about the lag in making those edits (and that they needed to happen in the first place).