microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Determine if and how `climate_environment` will be used in the submission schema #586

Open mslarae13 opened 1 year ago

mslarae13 commented 1 year ago

I am not sure how to write this example according to the structure. Would like to see some completed examples

Completion

turbomam commented 1 year ago

I do think I can check off most of those boxes within a day or two into next week

turbomam commented 1 year ago

GSC's MIxS Specification

_from https://github.com/GenomicsStandardsConsortium/mixs/blob/main/mixs/excel/mixs_v6.xlsx_

Environmental package agriculture plant-associated
Structured comment name climate_environment climate_environment
Package item climate environment climate environment
Definition Treatment involving an exposure to a particular climate; treatment regimen including how many times the treatment was repeated, how long each treatment lasted, and the start and end time of the entire treatment; can include multiple climates Treatment involving an exposure to a particular climate; treatment regimen including how many times the treatment was repeated, how long each treatment lasted, and the start and end time of the entire treatment; can include multiple climates
Expected value climate name;treatment duration;interval;experimental duration climate name;treatment interval and duration
Value syntax {text};{period};{interval};{period} {text};{Rn/start_time/end_time/duration}
Example   tropical climate;R2/2018-05-11T14:30/2018-05-11T19:30/P1H30M
Requirement C X
Preferred unit    
Occurrence m m
MIXS ID MIXS:0001040 MIXS:0001040

BBOP relational version of NCBI biosample_set:

select
    value,
    count(1)
from
    all_attribs aa
where
    aa.harmonized_name = 'climate_environment'
group by
    value
having
    count(1) > 1
order by
    count(1) desc;
value count
Mediterranean, subtropical 4120
not applicable 1283
NA 763
not collected 482
Humid subtropical 392
Lab microcosm 192
greenhouse 113
Warm temperate (Cfb) 95
Boreal (Dfb) 88
control conditions 40
controlled conditions, branch partially covered by plastic bag 36
freeze-thaw 35
continental with Mediterranean influences 27
Humid continental climate 25
subalpine 24
riparian zone 23
controlled conditions 18
freeze 17
thaw 17
submediterranean 14
drought 14
Controlled 12
tropical wet and dry climate 12
cold 8
heat 8
missing 7
Tropical 7
2 weeks cold storage 6
Dry 6
3 weeks cold storage 6
4 weeks cold storage 6
5 weeks cold storage 6
Orchard at harvest 6
Agricultural environment 6
Dry, Hot 5
Temperate 5
Common Garden, Flooding 3
Common Garden, Control 3
Common Garden, Drougth 3
Greenhouse, Heat, CO2 3
https://kare.ucanr.edu/Weather_Physical_-_Biological_Data/ 3
Greenhouse, Drought, Heat, CO2 3
Greenhouse, Flooding, Heat 3
temperate climate 3
Greenhouse, Drought, Heat 3
Greenhouse, Heat 3
Greenhouse, Flooding, Heat, CO2 3
common garden setup 2
KG biological replicates 3 2
KG biological replicates 1 2
ambient conditions 2
S biological replicates 2 2
desertic 2
CK biological replicates 3 2
KG biological replicates 2 2
watered 2
S biological replicates 1 2
none 2
Not applicable 2
S biological replicates 3 2
CK biological replicates 2 2
CK biological replicates 1 2
turbomam commented 1 year ago

Any Biosamples with a climate_environment value in the current NMDC production MongoDB?

db.getCollection("biosample_set").find( { climate_environment : { $exists : true } } );

0

ssarrafan commented 1 year ago

Adding to current sprint per Mark. Need feedback from @mslarae13

mslarae13 commented 1 year ago

@turbomam the NCBI examples are quite variable, and not what I'd expect. I am not surprised there's no time or duration for climate manipulation, but rather people just describe the comment.

I think , considering the examples we have, this field should just be a way to describe the climate_environment. and not as a "duration of treatment"

So from the plant example "tropical climate;R2/2018-05-11T14:30/2018-05-11T19:30/P1H30M" would just be "tropical climate"

@turbomam thoughts?

turbomam commented 1 year ago

@mslarae13 That's fine with me. Do you want to allow any string, or would you like to have a validation pattern, or some enumerated values?

ssarrafan commented 1 year ago

Discussion seems to be ongoing, moving to new sprint

mslarae13 commented 1 year ago

Name is misleading and should be climate treatment... People are putting biome and other information that should be in a different column here... Ramona suggests deprecating this term. I will put an issue into GSC

ramonawalls commented 1 year ago

Nearly all of the values in this slot are wrong and should go into one of the ENVO slots (e.g., biome). There is a legitimate need to record information about experimental environmental conditions, but there are either existing slots for that information or we should add new one(s) that are less confusing.

I recommend deprecating this term in MIxS and replacing it mostly with existing terms. If there is some environmental data that can't be captured, we can create new terms that are clearer.

mslarae13 commented 1 year ago

https://github.com/GenomicsStandardsConsortium/mixs/issues/591

mslarae13 commented 1 year ago

@turbomam Should we remove this from NMDC now? Or wait for GSC update?

Also, we pulled this term into the soil package. It can go away. (GSC only has it in agriculture & plant)

@pkalita-lbl FYI

turbomam commented 1 year ago

I'll remove in 7.6.1

mslarae13 commented 1 year ago

@turbomam did this get completed?

mslarae13 commented 1 month ago

Decision: Will deprecate this term. @sierra-moxon has created a deprecation protocol for NMDC. We'll discuss implementing this in to GSC & will deprecate this term when ready.