microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

env_broad_scale terms for East River watershed (study id gold:Gs0135149) aren't rooted in ENVO biome #611

Open aclum opened 7 months ago

aclum commented 7 months ago

These came up in a query written by @turbomam. We need to do the following:

related to previous discussions from Oct 2021 https://github.com/microbiomedata/nmdc-server/issues/537#issuecomment-941632979

aclum commented 7 months ago

Per Chris, Mark will add 'meadow biome' to ENVO to match the parallel 'meadow ecosystem' and do a release. Then update values in mongo to 'meadow biome'. Once this is done Alicia will communicate with GOLD.

ssarrafan commented 5 months ago

@aclum @turbomam is this required for GSP?

turbomam commented 5 months ago

I'll let @aclum answer the timing/priority

I'm am working on this now

turbomam commented 5 months ago
PREFIX gold: <https://bioregistry.io/gold:>
PREFIX MIXS: <https://w3id.org/mixs/>
PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select 
distinct  ?ebs_term ?ebs_label ?els_term ?els_label ?em_term ?em_label 
where {
    ?bs a nmdc:Biosample ;
        <http://purl.org/dc/terms/isPartOf> gold:Gs0135149 ;
        MIXS:0000012 ?ebs ;
        MIXS:0000013 ?els ;
        MIXS:0000014 ?em .
    ?ebs nmdc:term ?ebs_term .
    ?ebs_term rdfs:label ?ebs_label .
    ?els nmdc:term ?els_term .
    ?els_term rdfs:label ?els_label .
    ?em nmdc:term ?em_term .
    ?em_term rdfs:label ?em_label .
}
ebs_term ebs_label els_term els_label em_term em_label
ENVO:00000108 meadow ecosystem ENVO:00000292 watershed ENVO:00005802 bulk soil

I could have done a similar query over the GOLD ecosystem annotations

turbomam commented 5 months ago
PREFIX gold: <https://bioregistry.io/gold:>
PREFIX dcterms: <http://purl.org/dc/terms/>
select * where {
    gold:Gs0135149 dcterms:description ?d .
}

This research project aimed to understand how snow accumulation and snowmelt influences the mobilization of nitrogen through the soil microbiome in a mountainous catchment at the East River Watershed in Colorado. This project sought to identify bacteria, archaea, and fungi that were associated with the microbial biomass bloom that occurs during winter and the biomass crash following snowmelt. This project also sought to understand whether the traits that govern microbial community assembly during and after snowmelt were phylogenetically conserved. Samples were collected during winter, the snowmelt period, and after snowmelt in spring, from an area that transitioned from an upland hillslope to a riparian floodplain. This project is part of the Watershed Function Science Focus Area.

turbomam commented 5 months ago

LLM prompt:

my colleagues want to describe the samples included in a scientific study. We use terms form the MIxS standard and the environment ontology, EnvO.

They want to say that the env_broad_scale (MIXS:0000012) value is 'meadow ecosystem' (ENVO:00000108), the env_local_scale (MIXS:0000013) value is 'watershed' (ENVO:00000292) and the env_medium (MIXS:0000014) value is 'bulk soil' (ENVO:00005802)

Here are the definitions of the MIxS terms:

env_broad_scale, "broad-scale environmental context": Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO s biome class: http://purl.obolibrary.org/obo/ENVO\_00000428.

env_local_scale, "local environmental context": Report the entity or entities which are in the sample or specimen s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field.

env_medium, "environmental medium": Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top)

EnvO documentation about how to use those MIxS fields: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS

I object to that because "meadow ecosystem" isn't a subclass of biome as recommended above.

The study also provides this additional description: "This research project aimed to understand how snow accumulation and snowmelt influences the mobilization of nitrogen through the soil microbiome in a mountainous catchment at the East River Watershed in Colorado. This project sought to identify bacteria, archaea, and fungi that were associated with the microbial biomass bloom that occurs during winter and the biomass crash following snowmelt. This project also sought to understand whether the traits that govern microbial community assembly during and after snowmelt were phylogenetically conserved. Samples were collected during winter, the snowmelt period, and after snowmelt in spring, from an area that transitioned from an upland hillslope to a riparian floodplain. This project is part of the Watershed Function Science Focus Area."

please recommend EnvO class ids that we can use in those three MIxS fields. It would be good if we could retain the meadow and watershed knowledge in one of those fields, but I want to me more rigorous about the env_broad_scale as a a biome.

turbomam commented 5 months ago

LLM response:

Thank you for providing the detailed context and the MIxS term definitions. Based on the information you've shared, here are my recommendations for the EnvO class IDs that align with the MIxS fields and the study description:

In summary, I suggest using the following EnvO class IDs for the MIxS fields:

This combination of terms captures the broad-scale biome context, retains the meadow and watershed knowledge, and provides additional relevant information about the local environment and the medium surrounding the samples, all while adhering to the MIxS field definitions and the EnvO ontology.

turbomam commented 5 months ago

@aclum is 'montane biome' (ENVO:01000243) an acceptable existing EnvO term for gold:Gs0135149's broad-scale environmental context?

Pier made similar suggestiosn in https://github.com/EnvironmentOntology/envo/pull/1506

Note that Claude is recommending that we use multiple local environmental context values. @cmungall also suggested that recently. I feel like MIxS provided mixed messages about which of the environmental context fields are multi-valued. At this time, we have all of them set to single-valued in the nmdc-schema. Switching to multi-valued may be the right choice but will require lots of code changes.

If we chose 'montane biome' (ENVO:01000243) as the broad-scale environmental context, which of those should we choose for the local environmental context?

turbomam commented 5 months ago

Do we want to resume the bulk MIxS/EnvO to GOLD mapping effort that Bill started about 1.5 years ago?

The MIxS environmental context columns aren't included in the Biosample sheet available at https://gold.jgi.doe.gov/download?mode=site_excel via https://gold.jgi.doe.gov/downloads

Getting all metadata for all Biosamples in one GOLD study from the public API isn't too painful once you log in/authenticate. NMDC has additional access strategies not discussed here.

https://gold-ws.jgi.doe.gov/api/v1/biosamples?studyGoldId=Gs0135149

aclum commented 5 months ago

@turbomam you can get the info from the GOLD UI from this page you'd click on 'Select Columns for Table', expand the Environmental Information section and select 'Broad-scale Environmental Context', 'Local Environmental Context','Environmental Medium'. You can view this w/o being logged in but to download it you need to be logged in.

it would be valuable to resume GOLD ecosystem to ENVO mapping b/c this is a big barrier for sample ingest b/c NMDC requires it. We have some studies of interest where we could pilot this.

Based on the elevation listed in GOLD, 2780, compared to the surrounding areas as viewed on a topological map view this doesn't seem like a mountain area. I have strong preference to use what GOLD has, 'grassland biome [ENVO:01000177]', so we can close this issue out.

ssarrafan commented 5 months ago

Based on @aclum last comment assuming this is active. I'll move to next sprint @turbomam.

turbomam commented 5 months ago

I think somebody just has to make a changesheet for the env_broad_scale value for all of the Biosamples that are associated with this Study. I agree with @aclum's decision about using 'grassland biome [ENVO:01000177]'

aclum commented 5 months ago

Since you and Chris are the ones spearheading this effort the request is that you issue the changesheet.

turbomam commented 5 months ago

Should I do this now or after GSP/ECR?

ssarrafan commented 5 months ago

Should I do this now or after GSP/ECR?

It sounds like you should be able to do it now. Right? @aclum

aclum commented 5 months ago

It should be fine to do this now

On Wednesday, April 3, 2024, ssarrafan @.***> wrote:

Should I do this now or after GSP/ECR?

It sounds like you should be able to do it now. Right? @aclum https://github.com/aclum

— Reply to this email directly, view it on GitHub https://github.com/microbiomedata/issues/issues/611#issuecomment-2035640480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6RD36LBJ4VSK3X5FPVR7DY3RY6PAVCNFSM6AAAAABDCGUZCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGY2DANBYGA . You are receiving this because you were mentioned.Message ID: @.***>

ssarrafan commented 5 months ago

@turbomam is discussing with Chris this afternoon. Said to move to next sprint. @aclum FYI

ssarrafan commented 4 months ago

@turbomam is almost done. Needs @aclum to sign off to complete. Moving to next sprint.

turbomam commented 4 months ago

@aclum and I have postponed this to after thew re-iding takes place at the beginning of May. The necessary changesheets mention Biosamples whose part_of Study doesn't pass the current post-changesheet validation.

ssarrafan commented 3 months ago

Moving to new sprint due to re-IDing timeline

aclum commented 3 months ago

@turbomam this project was re-id'd so you should be able to update the changesheet and submit now.

ssarrafan commented 3 months ago

@turbomam @aclum any update on this? Done today? Next sprint? Backlog?

ssarrafan commented 3 months ago

Removing from sprint, no response on status, no updates for two weeks