microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Extend sample metadata to include amplicon related fields #139

Closed cmungall closed 2 years ago

cmungall commented 3 years ago

These should be in mixs.yaml already; these ones here https://cmungall.github.io/mixs-source/MIMARKSSurvey/

e.g.

MIMARKS survey➞samp_vol_we_dna_ext 0..1 Range: QuantityValue MIMARKS survey➞nucl_acid_ext 0..1 Range: String MIMARKS survey➞nucl_acid_amp 0..1 Range: String MIMARKS survey➞target_gene 1..1 Range: String MIMARKS survey➞target_subfragment 0..1 Range: String MIMARKS survey➞pcr_primers 0..1 Range: String MIMARKS survey➞pcr_cond 0..1 Range: String MIMARKS survey➞seq_meth 1..1 Range: String MIMARKS survey➞seq_quality_check 0..1 Range: String MIMARKS survey➞chimera_check 0..1 Range: String

so literally all that is required here is adding this to biosample.slots; e.g

  biosample:
    slots:
      ...
      - pcr_primers
      - pcr_cond
      - ....

Note: in the future we may introduce subclasses of sample as per the mixs schema above such that these slots only show up with amplicon samples, but that is out of scope for this ticket

Note: this change will purely be cosmetic until we actually have etl processes to populate, out of scope for this ticket

Estimated time: 30 mins

dehays commented 2 years ago

@cmungall You are suggesting putting pcr_primers on the biosample rather than on a new sequencing project type? GOLD puts these on the sequencing project.

More generally - I'd consider library creation metadata to be associated with the sequencing project rather than the biosample.

cmungall commented 2 years ago

@dehays good point. Yes, and I think all sub-fields of mixs:sequencing_field go on sequencing project https://cmungall.github.io/mixs-source/sequencing_field/ - we should document these kinds of high level mapping patterns

We could just go ahead and do this on the metadata call. It may be instructive for people to see how this kind of thing is done. Should take a few minutes max.

The harder work is populating this - whether extending the ETL that comes from gold, or from INSDC biosample - but that's not in scope for this ticket

wdduncan commented 2 years ago

Finished