microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

prefix for curie representation of `FieldResearchSite` id #575

Closed sujaypatil96 closed 10 months ago

sujaypatil96 commented 1 year ago

We need to decide on an appropriate prefix to assign the FieldResearchSite objects, specifically in the id property when their values are fetched from upstream GOLD biosample records.

The GOLD biosamples API endpoint returns the above information in the biosampleName field, and it needs to be parsed out from that.

For example, for GOLD biosample record Gb0305932 the biosample name that is returned from the GOLD API endpoint is Root microbial communities from poplar common garden site in Corvallis, Oregon, USA - GW-9591-Co2_62_24 endosphere. However, the Gb0305932 > Biosample Information > Identifier view on the GOLD website is different, so please be mindful of that.

We need to parse out the GW-9591-Co2_62_24 value from the biosample name, then provide an appropriate prefix preferably a bioscales resource, and assign it to the FieldResearchSite id.

Note: for example tree name identifiers look at this sheet.

turbomam commented 1 year ago

All ids must use the nmdc: prefix and follow the slot_usage patterns. The values that come from the GOLD API can go into the FieldResearchSite's name

aclum commented 10 months ago

What is currently defined in the schema is structured_pattern: syntax: '{id_nmdc_prefix}:frsite-{id_shoulder}-{id_blade}{id_version}{id_locus}' interpolated: true

I'm closing this ticket, please reopen if other work is needed.