microbiomedata / issues

public repo for issues related to NMDC work
2 stars 1 forks source link

Bioscales -metabolomics only samples metadata ingest #39

Closed ssarrafan closed 1 year ago

ssarrafan commented 1 year ago

Metadata ingest for metabolomics only samples for bioscales. Use the study identifier created from https://github.com/microbiomedata/issues/issues/38

@ssarrafan This is planned for the sprint starting 2/13/23

turbomam commented 1 year ago

@ssarrafan Is this a request for a change in the schema?

aclum commented 1 year ago

No, this is to import bioscales metabolomics only samples into NMDC. No new schema changes are required for that. Would you prefer this ticket be in a different repo?

aclum commented 1 year ago

We need an envio triad for these bioscales leaf material samples. @emileyfadrosh @mslarae13 @cmungall

To be discussed at the planning meeting on Monday. What metadata should be harvested from GOLD? We can use the same field research site (tree name) to pull in metadata from GOLD.

turbomam commented 1 year ago

This issue does not belong in the nmdc-schema repo because it doe not request changes to the schema. It might make sense to put it in sample-annotator.

aclum commented 1 year ago

Proposed plan

environmental context values:

broad context- terrestrial biome (ENVO_00000446) local context- 'environment associated with a plant part or small plant', (ENVO_01001057)

PO terms are allowed by MixS

Environmental medium - 'leaf' (PO_0025034)

Other fields provided in the (spreadsheet) collected_from -> NMDC FieldResearchSite identifier generated in mint NMDC ids of type FieldResearchSite for the trees#41 equivalent of GOLD habitat = 'leaf tissue'

use a join on field research site with other biosamples from the same GOLD study to pull in: geographic location latitude longitude location community sample contact name/email Sample Isolation Country/Ocean host name host taxon oid

populate nmdc schema slot samp_name with the tree name + leaf tssue (ie BESC-13-CL1_35_33 leaf tssue)

aclum commented 1 year ago

@sujaypatil96 there were no objections in slack to this proposal so we can move forward with the proposed plan.

aclum commented 1 year ago

@ssarrafan @sujaypatil96 This has to be done either the sprint the starts Monday or the following one.

sujaypatil96 commented 1 year ago

I think I might be able to do this in the following sprint. I have to work on some submission portal squad items this sprint.

ssarrafan commented 1 year ago

@aclum @sujaypatil96 any update on this?

aclum commented 1 year ago

@sujaypatil96 has this ready to submit to mongo. Moving to the next sprint as it won't be submitted to mongo until #57 is addressed.

sujaypatil96 commented 1 year ago

The data has been prepared here: https://github.com/microbiomedata/nmdc-datasets/blob/main/bioscales/bioscales_metabolomics.json, but still needs to be submitted to Mongo.

sujaypatil96 commented 1 year ago

The data has been submitted to Mongo and ingested into the data portal.