traitecoevo / austraits.build

Source for AusTraits
Other
16 stars 2 forks source link

Adding contextual data #240

Closed ehwenk closed 4 years ago

ehwenk commented 5 years ago

This issue serves both as a reminder that we need to add code to allow contextual data to be added to studies and also a check list of studies that need to have context added.

dfalster commented 4 years ago

As of 630dd14dd2da9f30499eb2358ddea6e88aaa7585, this is what the main trait table looks like:

image

And sites

image
dfalster commented 4 years ago

Details that we want in revised structure:

Suggest that dates just be added as a column to the main table. Can specify relevant column in config file in the same way we indicate site_name, and added with custom_R_code as needed.

For context, the challenge is that the measured variables vary study by study. One possible design is that in the metadata.yml, we specify a column name for context, then add a "contexts" section to the metadata. so contexts are treated just like sites, with unlimited extra variables, like site. Like sites, the contexts must all be named.

Additionally, we could nominate one of the context names as being the baseline. this would enable us to filter down to a minimal set of info when required. Other names might be things like high_light.

Finally, we add option to specify the baseline context in either config, or site-by-site, as the "baseline" scenario might vary site-by-site.

This will require some guidance on what the baseline conditions should be. It can either be closest to site ambient conditions, (e.g. for things like temperature), or closes to the standardise point of trait measurement. W.g. For specific_leaf_are this would be fully expanded young leaf in high light.

So a metadata file might look like:

contexts:
  low_light:
    description: Measured in understorey
    canopy cover (%): 80
  high_light:
    description: Measured in high light
    canopy cover (%): 0    
config:
  data_is_long_format: no
  variable_match:
    species_name: Species
    site_name: site
    context_name: conditions
    context_baseline: high_light
  custom_R_code: .na

OR

sites:
  Cape Tribulation:
    description: Complex mesophyll vine forest in tropical rain forest.
    context_baseline: 25_deg
  Myall:
    description: Open woodland.
    context_baseline: 20_deg
contexts:
  20_deg:
    description: Measured at 20deg C
    temp (deg): 20
  25_deg:
    description: Measured at 25deg C
    temp (deg): 25  
config:
  data_is_long_format: no
  variable_match:
    species_name: Species
    site_name: site
    context_name: temperature
  custom_R_code: .na

Thoughts @ehwenk @rachaelgallagher @CaitlanB ?

ehwenk commented 4 years ago

I like it, I think the baseline is a good idea, but prefer it as part of the "context" - I can't off-hand think of any studies with different baselines across sites - although maybe there are some (as your example).

One thing is that there will be studies with no "baseline" - just as studies that repeated measurements in wet and dry seasons. We could arbitrarily designate "wet" as "baseline" across all studies. Similarly with "time since fire" - is baseline the longest time since fire? We'll need to come up with some guidelines.

Last, there are a number of physiology studies (mostly those not yet input) with factorial treatments. Temp x CO2 concentration and also Firn_2019 had different nutrient addition combinations. We could easily create a single combined "contextual name" like "hiCO2_and_hiTemp" etc.

dfalster commented 4 years ago

Yes, I was thinking the same re factorial treatments

dfalster commented 4 years ago

Ok, i added framework for context on the branch "context", and an example from Prior_2003. Once w've added some more studies and are confident it works, we can commit to main and attack the physiological studies that have been on hold.

dfalster commented 4 years ago

After updates:

image image

And here is example from Prior_2003:

ontexts:
  wet:
    description: Leaves measured in the wet season
  dry:
    description: Leaves measured in the dry season
config:
  data_is_long_format: no
  variable_match:
    species_name: Species
    site_name: site
    observation_id: Tree
    context_name: season
  context_baseline: wet
  custom_R_code: data %>% mutate(Spe
CaitlanB commented 4 years ago

Sounds good.

I think the date column works for some studies well, other studies document the month/ season when the whole dataset was collected- this would be useful information to add as a field somehow.

How would context be added for studies like Vlasveld_2018 and Schulze_1998 where context is not a seperate column like 'site' but is a seperate column of trait values e.g. SLA_juvenile_leaf, SLA_adult_leaf?

dfalster commented 4 years ago

How would context be added for studies like Vlasveld_2018 and Schulze_1998 where context is not a seperate column like 'site' but is a seperate column of trait values e.g. SLA_juvenile_leaf, SLA_adult_leaf?

Good question @CaitlanB. How common is this? One option is we use the custom R code to reformat a little.

dfalster commented 4 years ago

I see the problem. Two measurements on the same row can't have the same column for context.

CaitlanB commented 4 years ago

Just those two datasets from my knowledge.

ehwenk commented 4 years ago

There are others as well - I haven't kept track of them, but occasionally there is a study where two columns are matched to the same variable.

ehwenk commented 4 years ago

In terms of having a "contextual_baseline" - To me, this term makes sense for experiments where there is a control and treatments that represent non-natural (or unusual) conditions. There are many studies where no one contextual value is more of a baseline than others. As an example, Lusk_2012 measured some traits in the field and other traits in a glasshouse experiment under different temperature controls, representing different field sites. None of these is more of a "baseline" than others. In Hall_1980, context captures the month leaf samples were collected in the field, with different species collected in different months. They were collected to represent different points within the wet (growing) and dry season, but there is no month that includes all species.

Instead, I propose that for all contextual values, the assigned context appears in the context column in the main traits table and under the contexts, a mandatory field is "type". Here we would use set terms to indicate if a context is a "control", "treatment", "field", and maybe something like "field_lower" or "field_stressful" if we wanted to indicate some field measurements were expected to be lower than others. I'm a little hesitant to have "field_lower" (or equivalent) because authors have such different motivations for measuring trait values under different naturally occurring field conditions.

The "type" field should also be able to be left as ".na", if appropriate.

If we wanted to make it easier for people to filter out "treatments", we could make sure the context names for treatments started with "treatment_" and have a simple function to filter those out.

dfalster commented 4 years ago

@ehwenk What about labelling the context_types field_harsh and field_favourable?

dfalster commented 4 years ago

Similar for treatment: treatment_favourable & treatment_harsh & treatment_control

I'm trying to get at a simple indicator of condition

dfalster commented 4 years ago

Also need to add sampling date into the data frame. See #361

ehwenk commented 4 years ago

Finished adding context to studies in list