continuous vs. discretized predictor

kzaret / PIUV_seedling_abundance

Bayesian GLMMs of P. uviferum seedling abundance at a burned and a control site, respectively.

0 stars 0 forks source link

continuous vs. discretized predictor #2

Open kzaret opened 1 year ago

kzaret commented 1 year ago

Heya @ebuhle, I'm following up on the new subject we started working through at the end of the first issue.

I wanted to verify that 'low' was my reference DWT level for the discretized data set. The contrasts don't look like those for patch. Is something wrong here?

> # convert continuous avg dwt values to ordinal categories (low, <= 0; medium, 1 to 25; high, >25)
> sp <- sp%>%
+   mutate(DWT = cut(avgdwt, breaks=c(-Inf, 0, 25, Inf), labels=c("low", "medium", "high")))
> sp$DWT <- factor(sp$DWT, c("low", "medium", "high"), ordered = TRUE)

> levels(sp.rmb$DWT)
[1] "low"    "medium" "high"  

> levels(sp.rmb$patch)
[1] "Forest"     "Bog Forest" "Bog"       

> contrasts(sp.rmb$DWT)
                .L         .Q
[1,] -7.071068e-01  0.4082483
[2,] -7.850462e-17 -0.8164966
[3,]  7.071068e-01  0.4082483

> contrasts(sp.rmb$patch)
           Bog Forest Bog
Forest              0   0
Bog Forest          1   0
Bog                 0   1

ebuhle commented 1 year ago

I wanted to verify that 'low' was my reference DWT level for the discretized data set. The contrasts don't look like those for patch.

Right, because DWT is ordered whereas patch is not. Ordered factors use polynomial contrasts by default, i.e. the contrasts() method for objects of class ordered is contr.poly(). You can reproduce this explicitly by calling contr.poly(sp.rmb$DWT), or you could force treatment contrasts (the default for class factor, so ignoring the ordering) with contr.treatment(sp.rmb$DWT). See ?contr.poly for more on the various contrast methods. With polynomial contrasts, there's no reference level per se but rather linear and quadratic terms, and so on up to order $K - 1$, treating the levels as sequential integers. That's why your model results include DWT.L and DWT.Q terms instead of DWT.medium and DWT.high.

Anyway, I thought you were going to use the raw measurements?

kzaret commented 1 year ago

Thank you!

Anyway, I thought you were going to use the raw measurements?

Yes, I just wanted to make sure that I knew what was going on with discDWT in case I get strong feedback from Andres or others and decide to interpret the models that include that variable without an interaction with patch.

kzaret commented 1 year ago

Thanks for your ongoing advice and insight, which are helping to keep me sane.

Here are the pairs plots for the model of PIUV seedlings regressed on average DWT x patch. Is the correlation between b_patchBog and b_avgdwt.rs:patchBog a dealbreaker?

 Family: negbinomial 
  Links: mu = log; shape = log 
Formula: PIUV_less5cmDBH_total ~ avgdwt.rs + patch + avgdwt.rs:patch + (1 | plot) 
         shape ~ patch
   Data: sp.rmb (Number of observations: 108) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Group-Level Effects: 
~plot (Number of levels: 27) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.35      0.15     0.06     0.67 1.00     2442     3010

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                    3.49      0.21     3.10     3.91 1.00     8304     7478
shape_Intercept              0.85      0.28     0.29     1.39 1.00     9379     7498
avgdwt.rs                   -0.36      0.16    -0.67    -0.04 1.00     9255     8076
patchBogForest              -0.94      0.27    -1.49    -0.42 1.00     7557     7171
patchBog                    -3.38      0.95    -5.11    -1.33 1.00     8694     5709
avgdwt.rs:patchBogForest     0.08      0.22    -0.35     0.50 1.00     7092     7421
avgdwt.rs:patchBog           0.25      1.01    -1.69     2.31 1.00     8928     5758
shape_patchBogForest         0.31      0.45    -0.56     1.23 1.00    12994     7668
shape_patchBog              -1.89      0.54    -2.92    -0.81 1.00    10161     7304

ebuhle commented 1 year ago

It is strong, and should be mentioned when interpreting the results, but I don't know that it's a deal-breaker. Clearly it adds to the uncertainty in the marginal patchBog and avgdwt.rs:patchBog effects. But it doesn't prevent the main effect from being strongly negative (fewer seedlings in Bog), nor does it look like the interaction term would be anything but a total wash even without the unbalanced design causing confounding. The other level of the interaction term is a wash too, so it doesn't seem like there's evidence for patch-specific water table effects.

kzaret commented 1 year ago

There's a slight negative effect of avg. DWT on seedling counts in the Forest patch. This could make sense in that DWTs are higher in the that patch, subplots characterized by substrate types far above the water table could be fine seedbeds (especially trunks covered in moisture-capturing moss, where I observed many seedlings with cotyledons), but as a seedlings grow and develop more roots and have higher water needs they may become "stranded" in higher locations. . . . .

ebuhle commented 1 year ago

There's a slight negative effect of avg. DWT on seedling counts in the Forest patch.

And quite possibly in the other patches too, right? If you do b_avgdwt.rs + b_avgdwt.rs:patchBog and b_avgdwt.rs + b_avgdwt.rs:patchBogForest, respectively, what do you get?

subplots characterized by substrate types far above the water table could be fine seedbeds (especially trunks covered in moisture-capturing moss, where I observed many seedlings with cotyledons), but as a seedlings grow and develop more roots and have higher water needs they may become "stranded" in higher locations

I was wondering about this. My thought was that seedlings that successfully establish on nurse logs, etc. obviously have other means of capturing moisture without access to groundwater, so maybe DWT wouldn't matter. But your point, if I follow, is that eventually it will matter and thus a plot with more of these elevated substrates will support fewer seedlings on average. Makes sense to me, someone who knows nothing.

Also, how serious were you about the unimodal hypothesis whereby large positive and negative DWT are both bad? If you really think the relationship is non-monotonic, then this (log-) linear fit would be misleading.