microbiomedata / issues

public repo for issues related to NMDC work
2 stars 1 forks source link

Use ontology access kit to identify environmental terms for each triad #843

Closed mslarae13 closed 1 month ago

mslarae13 commented 2 months ago

use ontology access kit for a LLM approach to identifying the environmental terms that are fitting for each extension

https://incatools.github.io/ontology-access-kit/index.html

Apply this LLM method to all extensions

MixS Environment name submission portal DhInterface name harmonizerApi.ts status priority in https://github.com/microbiomedata/issues/issues/468#issuecomment-2243964338 env_broad_scale env_local_scale env_medium
PlantAssociated PlantAssociatedInterface published high      
Sediment SedimentInterface published high      
Soil SoilInterface published high      
Water WaterInterface published high      
Air AirInterface published low      
BuiltEnvironment BuiltEnvInterface published low      
HostAssociated HostAssociatedInterface published low      
HydrocarbonResourcesCores HcrCoresInterface published low      
HydrocarbonResourcesFluidsSwabs HcrFluidsSwabsInterface published low      
MicrobialMatBiofilm BiofilmInterface published low      
MiscellaneousNaturalOrArtificialEnvironment MiscEnvsInterface published low      
mslarae13 commented 2 months ago

Break into 3 smaller tasks for @turbomam , @Natalie-Winans , & @sierra-moxon to divide & concur

Remove LLM for now. Separate issue!

Run the queries we decided policy on, adjust as glaringly needed/ must do.

Update this table!

mslarae13 commented 2 months ago

@turbomam I think it's safe to say this is in progress? And you're getting lists into google drive/ sheets for use to review. Yes?

mslarae13 commented 1 month ago

@turbomam I think we can close this one. You provided the initial OAK queries to enable us to start issues #849 #848 & #847 . Do you agree?

turbomam commented 1 month ago

I got the sense that this issue was about designing OAK queries for all of the high priority environments, and that the issues below were specifically about Soil.

Closing it is fine with me if you think the all-environments aspect is captured somewhere else.

I also really like that prioritization and cross-reference table at the top and hope we find a more prominent home for it.

mslarae13 commented 1 month ago

@turbomam

I can stick it in the ADR!?

I think we should make separate issues for each environment. Small, concise issues with specific targets. Like we did for soil. This is about the "general" query / check.