microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Biosample slot usage for source_mat_id was not migrated to sheets_and_friends #783

Open turbomam opened 1 year ago

turbomam commented 1 year ago

should go in dh_mutliview_common_columns

mslarae13 commented 1 year ago

follow up with @pkalita-lbl that dh_mutliview_common_columns is for all sample types

turbomam commented 1 year ago

I do see these in sheets-for-nmdc-submission-schema_MLS now. They're also in https://github.com/microbiomedata/sheets_and_friends/blob/issue-163-mixs-subsetter/artifacts/sheets-for-nmdc-submission-schema_MLS-modifications_long.tsv, which I am using in my development of a pure sheets_and_friends approach for managing the relationship between the nmdc-schema and MIxS

slot action target value
source_mat_id add_attribute comments Identifiers must be prefixed. Possible FAIR prefixes are IGSNs (http://www.geosamples.org/getigsn), NCBI biosample accession numbers, ARK identifiers (https://arks.org/). These IDs enable linking to derived analytes and subsamples. If you have not assigned FAIR identifiers to your samples, you can generate UUIDs (https://www.uuidgenerator.net/).
source_mat_id replace_attribute description A globally unique identifier assigned to the biological sample.
source_mat_id overwrite_examples examples IGSN:AU1243|UUID:24f1467a-40f4-11ed-b878-0242ac120002
source_mat_id replace_attribute identifier true
source_mat_id add_attribute notes The source material IS the Globally Unique ID
source_mat_id replace_attribute required true
source_mat_id replace_attribute string_serialization {text}:{text}
source_mat_id replace_attribute title source material identifier
source_mat_id add_attribute todos Currently, the comments say to use UUIDs. However, if we implement assigning NMDC identifiers with the minter we don't need to require a GUID. It can be an optional field to fill out only if they already have a resolvable ID.

I can't really remember where we didn't see them so I'm leaving this issue open.

I had been extracting Biosample's MIxS slot usages with nmdc_schema/get_class_usages.py in the nmdc-schema repo.

turbomam commented 1 year ago

follow up with @pkalita-lbl that dh_mutliview_common_columns is for all sample types

I don't think we have defined "sample types", but here is my understanding of dh_mutliview_common_columns. Notably, the integration with Data Portal interfaces/templates is manually configured with dh_interfaces tab.