microbiomedata / issues

public repo for issues related to NMDC work
2 stars 1 forks source link

Planning for biosample ingest of NEON data - Soil microbe metagenome sequences #131

Closed aclum closed 1 year ago

aclum commented 1 year ago

Is your feature request related to a problem? Please describe. Develop plan and process to ingest biosample_set metadata for NEON samples that are not in GOLD. Based on priorities bringing in the soil microbial metaagenome data is the highest priority of the three NEON shotgun metagenomic products (DP1.10107.001)

Describe the solution you'd like Use the NEON API to create NMDC schema valid biosample_set records for NEON soil metagenomes sequenced outside of JGI.

Describe alternatives you've considered Per the documentation the NEON API is the only place all of the data lives. Possible alternatives included pulling metadata from other samples (ie some 1000 soils samples are from NEON sites which are currently in the submission portal)

Acceptance Criteria NMDC schema valid biosample set record generated from code which parses NEON API records and/or spreadsheets. Validated with NMDC runtime API endpoint metadata/json:validate

aclum commented 1 year ago

37/45 of the soil NEON sites are already in GOLD, with MIxS environmental terms, as part of previous efforts (https://gold.jgi.doe.gov/study?id=Gs0144570). Emailed Hugh with suggestions for the 8 remaining soil sites. MIxS work is currently in this spreadsheet

turbomam commented 1 year ago

Thanks, @aclum

Eventually we want the all of these products, right?

Are you thinking that we won't use Soil microbial communities from various NEON sites located in USA and Puerto Rico, Gs0161344 ?

aclum commented 1 year ago

@turbomam this ticket i made is specific to DP1.10107.001 but yet ultimately we want all three data products.

We will also want what is in Soil microbial communities from various NEON sites located in USA and Puerto Rico, Gs0161344, however those are new samples, 2021 based on sample names, which were only just recently sequenced at JGI and aren't in the NEON's latest official release (RELEASE-2023). I'm working with Hugh to define MIxS environmental terms for the newer samples from Gs0161344.

aclum commented 1 year ago

There are 491 NEON soil samples in GOLD taht are part of DP1.10107.001 RELEASE-2023 + one anomaly that I think is a marker gene but mislabeled as a metagenome in GOLD. There are 831 sample names (labeled as dnaSampleID ) that we need to pull in from NEON directly. The 'In GOLD? column on the 'NEON vs GOLD' tab of this spreadsheet lists which samples are in GOLD or not.

ssarrafan commented 1 year ago

Moving to next sprint. Actively being worked on.

ssarrafan commented 1 year ago

@aclum is this still actively being worked on?

aclum commented 1 year ago

I haven't had much time this sprint but will pick this back up next sprint.

aclum commented 1 year ago

At terrestrial sites, soil metagenomic sampling occurs annually at a minimum of one site per domain dur‐ ing the period of peak greenness and in conjunction with the soil physical and chemical properties data product (DP1.10086). Once every five years, a ‘coordinated’ bout occurs in which additional biogeochemical and isotopic mea‐ surements are made (DP1.10078), along with measurements of microbe biomass (DP1.10104) and ni‐ trogen transformation rates (DP1.10080). During a coordinated bout, up to 2 soil horizons (organic and mineral) are sampled for microbial metagenomics analysis to a maximum depth of 30 cm.

mslarae13 commented 1 year ago

R: @turbomam , @aclum A: @aclum C: @pkalita-lbl , @mslarae13 I: @cmungall

Completion date goal:

aclum commented 1 year ago

Closing this planning ticket as various other tickets have been created which are all part of the NEON Activity Board.