microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

Milestone - Define EnvO value sets for supported environmental extensions (3.2) #468

Open ssarrafan opened 11 months ago

ssarrafan commented 11 months ago

A key part of the schema is the allocation of different metadata elements to different environmental packages (e.g., ‘depth’ is a required metadata element for soil and sediment samples, and conversely ‘altitude’ is required for aerial samples). In the Pilot, we directly adopted the MIxS environment packages, and extended them with fields required by EMSL and JGI. While this provided a foundation, we identified many areas where the MIxS environmental packages are too rigid, or are at suboptimal levels of granularity. In collaboration with the GSC and the broader research community, we will support the development of more specific packages for a variety of ecosystems (e.g., environments like wetlands, mangroves or complex riparian systems should have their own package extensions, and the schema allows for progressive refinement or crossing of packages), and continue to improve existing packages based on community feedback. To address a common community challenge in navigating ontologies, each of these environmental packages will be supported by defined EnvO value sets (cross-sections of the ontology with key terms relevant for a specific environment) such that data submitters can provide precise and accurate descriptive terms through a simple dropdown, without having to navigate the whole EnvO structure (Submission Portal, Milestone 3.2). Page 28

see #469 #470 #471

mslarae13 commented 10 months ago

This should be environmental extensions. We need to make this correction across lots of things.

Do we have target extensions? Is it all the ones currently on the subport? (Which is basically all)

ssarrafan commented 10 months ago

This should be environmental extensions. We need to make this correction across lots of things.

Do we have target extensions? Is it all the ones currently on the subport? (Which is basically all)

@cmungall can you respond to Montana's questions please.

cmungall commented 10 months ago

We should prioritize BER-relevant ones

ssarrafan commented 8 months ago

This is a Q4 milestone. Updating issue to Q4.

aclum commented 1 month ago

@mslarae13 and @turbomam will get together to discuss this.

mslarae13 commented 1 month ago

We should prioritize BER-relevant ones

So, HIGH

LOW

aclum commented 1 month ago

We can exclude the user facility interface tabs.

turbomam commented 1 month ago

Thanks @aclum and @mslarae13 for tending to this. I have been thinking about different ways to keep track of our intentions, the implementations, and whether a value set is complete. There's probably no one perfect way of doing it.

I think we should decide

I have created a table that relates @mslarae13 's recent prioritization list with some other knowledge about the environments/Extensions/DH Interfaces. I would like to include most of this information in whatever progress tracking system we use. Since the table is wide, maybe we should move it to a Google Sheet or a repo-checked-in TSV, instead of embedding it in an issue like this.

MixS Environment name submission portal DhInterface name harmonizerApi.ts status priority in https://github.com/microbiomedata/issues/issues/468#issuecomment-2243964338 env_broad_scale env_local_scale env_medium
HumanAssociated   disabled        
HumanGut   disabled        
HumanOral   disabled        
HumanSkin   disabled        
HumanVaginal   disabled        
PlantAssociated PlantAssociatedInterface published high      
Sediment SedimentInterface published high      
Soil SoilInterface published high      
Water WaterInterface published high      
Air AirInterface published low      
BuiltEnvironment BuiltEnvInterface published low      
HostAssociated HostAssociatedInterface published low      
HydrocarbonResourcesCores HcrCoresInterface published low      
HydrocarbonResourcesFluidsSwabs HcrFluidsSwabsInterface published low      
MicrobialMatBiofilm BiofilmInterface published low      
MiscellaneousNaturalOrArtificialEnvironment MiscEnvsInterface published low      
WastewaterSludge WastewaterSludgeInterface   low      
Agriculture            
FoodAnimalAndAnimalFeed            
FoodFarmEnvironment            
FoodFoodProductionFacility            
FoodHumanFoods            
SymbiontAssociated            
turbomam commented 1 month ago

@pkalita-lbl you can see that I have tracked the DhInterface name from submission-schema/schemasheets/tsv_in/classes.tsv and the status from harmonizerApi.ts in my table above

I didn't include the excel_worksheet_name annotations form your new

but the table is intended to do some of the mapping that we have been talking about.

I'm a little surprised that WastewaterSludgeInterface appears many places in the submission-schema repo (and @mslarae13 included it in her prioritization list, albeit as low) but it doesn't appear in harmonizerApi.ts

aclum commented 1 month ago

will we need to use classes from any ontologies other than EnvO and PO for the environments that have been marekd as high priority?

mslarae13 commented 1 month ago

Updating @turbomam 's table (IN PROGRESS)

MixS Environment name submission portal DhInterface name harmonizerApi.ts status priority in https://github.com/microbiomedata/issues/issues/468#issuecomment-2243964338 env_broad_scale env_local_scale env_medium
HumanAssociated   disabled        
HumanGut   disabled        
HumanOral   disabled        
HumanSkin   disabled        
HumanVaginal   disabled        
PlantAssociated PlantAssociatedInterface published high      
Sediment SedimentInterface published high      
Soil SoilInterface published high      
Water WaterInterface published high      
Air AirInterface published low      
BuiltEnvironment BuiltEnvInterface published low      
HostAssociated HostAssociatedInterface published low      
HydrocarbonResourcesCores HcrCoresInterface published low      
HydrocarbonResourcesFluidsSwabs HcrFluidsSwabsInterface published low      
MicrobialMatBiofilm BiofilmInterface published low      
MiscellaneousNaturalOrArtificialEnvironment MiscEnvsInterface published low      
WastewaterSludge WastewaterSludgeInterface unpublished (need to add)  low      
Agriculture     high       
FoodAnimalAndAnimalFeed     low       
FoodFarmEnvironment     low        
FoodFoodProductionFacility     low        
FoodHumanFoods      low      
SymbiontAssociated      low      
mslarae13 commented 1 month ago

The following extensions are NOT in NMDC. I'm not sure why, and we need to check what version of MIxS we're using. I'll make a separate issue for that. but for this milestone & the squad addressing it, we'll skip these extensions

Agriculture FoodAnimalAndAnimalFeed FoodFarmEnvironment FoodFoodProductionFacility FoodHumanFoods SymbiontAssociated

Edit, this issue exists, which is similar. nmdc submission-schema and nmdc-schema don't seem to be aware of slots that are unique to these extensions. Making me conclude we don't use v6.

ssarrafan commented 1 month ago

@mslarae13 @aclum @cmungall thanks for all the updates on this issue. Will this be done by September? This is due this quarter.

mslarae13 commented 1 month ago

Will this be done by September? This is due this quarter.

@ssarrafan that's the goal

ssarrafan commented 1 month ago

Per @cmungall Patrick is not needed for this issue. Discussed at meeting today with Alicia, Emiley, Chris. FYI @mslarae13

mslarae13 commented 3 weeks ago

@ssarrafan due date for this is still end of September, right?

ssarrafan commented 3 weeks ago

@ssarrafan due date for this is still end of September, right?

Yes so far.