microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

Inference of environmental package based on MIxS environmental context terms #547

Open sujaypatil96 opened 3 weeks ago

sujaypatil96 commented 3 weeks ago

A requirement/use case that has come up in the NCBI Export squad is the need to be able to infer the NCBI MIxS package from metadata in the various slots of NMDC Biosamples.

During a discussion/meeting, it was recommended/decided that we would use a combination of the environmental terms that we have asserted on the NMDC schema Biosample class.

We can use either one or both of the below mechanisms to inform this inference:

The package inferred needs to be a value from the list of NCBI MIxS package list here: https://www.ncbi.nlm.nih.gov/biosample/docs/packages/

A good starting point for this is to use an LLM to understand if there are any widely used web services that allow this inference, or if we should come up with our own design/implementation to accomplish the above.