Closed ehwenk closed 4 years ago
As of 630dd14dd2da9f30499eb2358ddea6e88aaa7585, this is what the main trait table looks like:
And sites
Details that we want in revised structure:
year_collected_start
and year_collected_end
in methods table. Suggest that dates just be added as a column to the main table. Can specify relevant column in config file in the same way we indicate site_name, and added with custom_R_code as needed.
For context, the challenge is that the measured variables vary study by study. One possible design is that in the metadata.yml, we specify a column name for context, then add a "contexts" section to the metadata. so contexts are treated just like sites, with unlimited extra variables, like site. Like sites, the contexts must all be named.
Additionally, we could nominate one of the context names as being the baseline
. this would enable us to filter down to a minimal set of info when required. Other names might be things like high_light
.
Finally, we add option to specify the baseline context in either config, or site-by-site, as the "baseline" scenario might vary site-by-site.
This will require some guidance on what the baseline conditions should be. It can either be closest to site ambient conditions, (e.g. for things like temperature), or closes to the standardise point of trait measurement. W.g. For specific_leaf_are this would be fully expanded young leaf in high light.
So a metadata file might look like:
contexts:
low_light:
description: Measured in understorey
canopy cover (%): 80
high_light:
description: Measured in high light
canopy cover (%): 0
config:
data_is_long_format: no
variable_match:
species_name: Species
site_name: site
context_name: conditions
context_baseline: high_light
custom_R_code: .na
OR
sites:
Cape Tribulation:
description: Complex mesophyll vine forest in tropical rain forest.
context_baseline: 25_deg
Myall:
description: Open woodland.
context_baseline: 20_deg
contexts:
20_deg:
description: Measured at 20deg C
temp (deg): 20
25_deg:
description: Measured at 25deg C
temp (deg): 25
config:
data_is_long_format: no
variable_match:
species_name: Species
site_name: site
context_name: temperature
custom_R_code: .na
Thoughts @ehwenk @rachaelgallagher @CaitlanB ?
I like it, I think the baseline is a good idea, but prefer it as part of the "context" - I can't off-hand think of any studies with different baselines across sites - although maybe there are some (as your example).
One thing is that there will be studies with no "baseline" - just as studies that repeated measurements in wet and dry seasons. We could arbitrarily designate "wet" as "baseline" across all studies. Similarly with "time since fire" - is baseline the longest time since fire? We'll need to come up with some guidelines.
Last, there are a number of physiology studies (mostly those not yet input) with factorial treatments. Temp x CO2 concentration and also Firn_2019 had different nutrient addition combinations. We could easily create a single combined "contextual name" like "hiCO2_and_hiTemp" etc.
Yes, I was thinking the same re factorial treatments
Ok, i added framework for context on the branch "context", and an example from Prior_2003. Once w've added some more studies and are confident it works, we can commit to main and attack the physiological studies that have been on hold.
After updates:
And here is example from Prior_2003:
ontexts:
wet:
description: Leaves measured in the wet season
dry:
description: Leaves measured in the dry season
config:
data_is_long_format: no
variable_match:
species_name: Species
site_name: site
observation_id: Tree
context_name: season
context_baseline: wet
custom_R_code: data %>% mutate(Spe
Sounds good.
I think the date column works for some studies well, other studies document the month/ season when the whole dataset was collected- this would be useful information to add as a field somehow.
How would context be added for studies like Vlasveld_2018 and Schulze_1998 where context is not a seperate column like 'site' but is a seperate column of trait values e.g. SLA_juvenile_leaf, SLA_adult_leaf?
How would context be added for studies like Vlasveld_2018 and Schulze_1998 where context is not a seperate column like 'site' but is a seperate column of trait values e.g. SLA_juvenile_leaf, SLA_adult_leaf?
Good question @CaitlanB. How common is this? One option is we use the custom R code to reformat a little.
I see the problem. Two measurements on the same row can't have the same column for context.
Just those two datasets from my knowledge.
There are others as well - I haven't kept track of them, but occasionally there is a study where two columns are matched to the same variable.
In terms of having a "contextual_baseline" - To me, this term makes sense for experiments where there is a control and treatments that represent non-natural (or unusual) conditions. There are many studies where no one contextual value is more of a baseline than others. As an example, Lusk_2012 measured some traits in the field and other traits in a glasshouse experiment under different temperature controls, representing different field sites. None of these is more of a "baseline" than others. In Hall_1980, context captures the month leaf samples were collected in the field, with different species collected in different months. They were collected to represent different points within the wet (growing) and dry season, but there is no month that includes all species.
Instead, I propose that for all contextual values, the assigned context appears in the context column in the main traits table and under the contexts, a mandatory field is "type". Here we would use set terms to indicate if a context is a "control", "treatment", "field", and maybe something like "field_lower" or "field_stressful" if we wanted to indicate some field measurements were expected to be lower than others. I'm a little hesitant to have "field_lower" (or equivalent) because authors have such different motivations for measuring trait values under different naturally occurring field conditions.
The "type" field should also be able to be left as ".na", if appropriate.
If we wanted to make it easier for people to filter out "treatments", we could make sure the context names for treatments started with "treatment_" and have a simple function to filter those out.
@ehwenk What about labelling the context_types field_harsh
and field_favourable
?
Similar for treatment: treatment_favourable
& treatment_harsh
& treatment_control
I'm trying to get at a simple indicator of condition
Also need to add sampling date into the data frame. See #361
Finished adding context to studies in list
This issue serves both as a reminder that we need to add code to allow contextual data to be added to studies and also a check list of studies that need to have context added.
[x] add code to make it possibly to add contextual data
[x] Prior_2003
[x] Lusk_2012
[x] Jurado_2012
[x] Firn_2019 (contextual mean opposed to site mean for plots)
[x] Schulze_1998 (age of leaves)
[x] Lim_2017 (slope position)
[x] Lee_2019 (stem diameter class)
[x] Vesk_2019 (control, clip, clip and burn treatments)
[x] Smith_2012 (experimental ecophysiology study with CO2 and temperature treatments)
[x] Thomas_2017 (years since fire)
[x] Eamus_1999 (season of measurement)
[x] Read_2005 (soil type)
[x] Vlasveld_2018 (age of leaves- adult/juvenile)
[x] Wills_2018 (growth group- fast/med/slow)