Specification of random effects for nested+longitudinal study design

ChrKoenig commented 4 years ago

Hi!

I'd appreciate your advice regarding the specification of random effects in Hmsc. My dataset consists of standardized, annually repeated surveys across an evenly spaced grid of sampling plots. The exact location of individuals within each plot is recorded, allowing me to derive species occurrences at finer grain sizes by subdividing the the entire plot into a number of equally-sized sub-plots. I am mostly interested in the change in species association patterns at different grain sizes, but I'm not quite sure whether my specification of random effects is correct. I need random effects for three reasons: (1) to make Hmsc estimate pairwise residual correlations, (2) to account for the fact that sub-plots are nested within sampling plots, and (3) to account for the repeated sampling in different years. From my understanding, the specification of study design and random effects at the coarsest level, i.e. for entire sampling plots, should be as follows:

sampling_design

sampling_plot | year
--------------|--------
1             | 2010
1             | 2011
2             | 2010
2             | 2011
...           | ...

random effects

rnd_eff = list("sampling_plot" = HmscRandomLevel(units = study_design$sampling_plot),   # obtain residual correlations
               "year" = HmscRandomLevel(units = study_design$year))   # account for temporal autocorrelation

However, when I want to look at the sub-plot level, I am not quite sure if the following is correct:

sampling_design

sampling_plot |  sub_plot  | year
--------------|------------|--------
1             |     1      | 2010
1             |     2      | 2010
1             |     3      | 2010
1             |     4      | 2010
1             |     1      | 2011
1             |     2      | 2011
1             |     3      | 2011
1             |     4      | 2011
2             |     5      | 2010
...           |     ...    | ...

random effects

rnd_eff = list("sampling_plot" = HmscRandomLevel(units = study_design$sampling_plot),   # account for nested sampling,
               "sub_plot" = HmscRandomLevel(units = study_design$sub_plot),   # obtain residual correlations
               "year" = HmscRandomLevel(units = study_design$year))   # account for temporal autocorrelation

Any thoughts on that?

oysteiop commented 4 years ago

Hi, I have made a suggested setup below. Note use of unique() to avoid repeating the same plot several times:

rL1 = HmscRandomLevel(units = unique(study_design$sampling_plot))

For the subplots, you should first create unique sub-plot IDs, for example:

study_design$unique_sub_plot = paste(study_design$sampling_plot, study_design$subplot, sep="")

rL2 = HmscRandomLevel(units = unique(study_design$unique_sub_plot))

rL3 = HmscRandomLevel(units = unique(study_design$year))

This implementation allows you to ask e.g. if some species tend to occur in the same years. Alternatively, to make the analysis temporally explicit, see the argument sData in HmscRandomLevel(). (Temporal data can be modelled the same way as spatial data, but with only one dimension.)

Finally, the complete list of random effects: rnd_eff = list("sampling_plot" = rL1, # account for nested sampling, "sub_plot" = rL2, # obtain residual correlations "year" = rL3) # account for temporal autocorrelation

ChrKoenig commented 4 years ago

Thanks, @oysteiop! In fact, I do have unique sub-plot IDs already, so it was basically just a matter of wrapping a unique() around the sampling-plot IDs. Just to be sure: To contrast patterns in residual correlations at different grain sizes (e.g. sampling-plots, subplots-large, subplots-medium, subplots-fine), I would compare the species association matrices with rL2, except for the coarsest resolution (sampling-plots), where I would use rL1?

oysteiop commented 4 years ago

Yes, this sounds reasonable.

gtikhonov commented 4 years ago

Hi @ChrKoenig , Your design sounds quite elaborate and I believe that there is more than a just single right option to incorporate it to the Hmsc. So it is more about what research questions you have in mind. For example, in the design that you've discussed above, you do not really estimate "the most residual" associations, as your subplot effect is assumed to be constant over years. Well, you have an extra expert of the year, but it is assumed to affect all samples in a given year equally, so it captures the synchronous variation across years. In my opinion, to get the residual associations, you should include the effect that corresponds to the pair of (subplot x year), for instance in a way as @oysteiop demonstrated above.

hmsc-r / HMSC

Specification of random effects for nested+longitudinal study design #32