microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

Define environmental terms for NEON surface water metagenome samples (DP1.20281.001) #273

Closed aclum closed 8 months ago

aclum commented 1 year ago

Deliverable this task is associated with

_See Deliverables tab here: https://docs.google.com/spreadsheets/d/1jF1RU_TwQlJpqvHnKk-KieE6VlUv0je4eFddK9idPuE/edit?usp=sharing_

2

RACI

Tag people in their roles

Describe the the task?

Information will be added to mongo via Sujay's ingest code

Criteria for completion

Completion Date (Goal)

Target Sprint Start & End Dates

Tag Blocker/Contingent upon issues

aclum commented 1 year ago

@turbomam and I had some discussions with Hugh today, amc_fieldSuperParentcsv fields aquaticSiteType and namedLocation are the best fields to use to define the environmental terms. aquaticSiteType value of lake would map to freshwater lake biome aquaticSiteType value of river would map to freshwater river biome

Parsing the pattern in named location would help determine the environmental local scale term (CRAM.AOS.buoy.c1) buoy -> area of open water

aclum commented 1 year ago

figure 8 from NEON.DOC.003044.vE AOS Protocol and Procedure: AMC – Aquatic Microbial Sampling

B.3 Lakes and River Collection

  1. Select Sampling Locations (Figure 8) a. Rivers are sampled at the buoy (‘c0’ - non-stratified, or ‘c1’, ‘c2’, ‘c3’ - stratified) location at the same time as water chemistry sampling. i. Sample depths are dependent on stratification conditions on the day of sampling. Title: AOS Protocol and Procedure: AMC – Aquatic Microbial Sampling Date: 12/15/2021 NEON Doc. #: NEON.DOC.003044 Author: S. Parker Revision: E SOP B Page 28 b. Lakes are sampled at up to three locations at the same time as water chemistry sampling. i. Seepage lakes are only be sampled at buoy (c0, c1, c2, c3) ii. Flow–through lakes will be sampled at the inflow (in), outflow (ot) and buoy (c0, c1, c2, c3) iii. Sample depts at the buoy depend on stratification conditions on the day of sampling c. Refer to Section B.4 in Water Chemistry Sampling in Surface Waters and Groundwater (RD[10]). d. Determine the stratification conditions at the buoy from the Secchi Disk and Depth Profile Sampling in Lakes and Rivers (RD[18]), Section 7, SOP B.3. e. Sample in the same locations and depths where water chemistry samples are collected RD[13]).

image

so between a namedLocation of CRAM.AOS.buoy.c1 and an aquaticSiteType of Lake we know the sample is from NEON site CRAM, which is an Aquatic Observation System (AOS), from a lake collected in open water (buoy) from a stratified lake at a depth of 0.5 meters (c1) convention is $SITE.$SYSTEM.$LOCATION_W/IN_RIVER_OR_LAKE.$WATER_DEPTH_CODE

aclum commented 1 year ago

Notes from Hugh Here is a start for aquatic samples. For Broad scale, I believe it would be either “freshwater lake biome” [ENVO:01000252], or “freshwater river biome” [ENVO:01000253]. For surface samples, this would be derived from the “aquaticSiteType” field in the amc_fieldSuperParent table. For surface water medium scale, I can’t find anything better than “lake water”, [ENVO:04000007] and “river water”, [ENVO:01000599] For surface local scale, there is not much detail given as to habitat or part of river, so I suggest “freshwater river”, [ENVO:01000297] for river/stream local scale. For lakes, samples are collected either in littoral zone (near shore) or out in the deeper part of the lake. For these two the terms “freshwater littoral zone”, [ENVO:01000409] and “area of open water”, [ENVO:01000666] seem to fit. This distinction would have to be derived from the “namedLocation” field in the surface amc_fieldSuperParent table. Within the names of this field, it either contains ‘littoral’ or ‘buoy’, to designate where the sample was collected (for streams or rivers, this field is not useful).

aclum commented 1 year ago

fig 2 from NEON_cellCount_userGuide_vC

image

ssarrafan commented 1 year ago

Appears to be active so will move to the next sprint.

aclum commented 1 year ago

local terms for lake surface water samples for lakes: c0 = surface (<0.5 m depth) = ‘water surface’ [ENVO:01001191]; c1 = ‘epilimnion’ [ENVO:00002131]; c2 = ‘thermocline’ [ENVO:00002269]; c3 = ‘hypolimnion’ [ENVO:00002130]; I suggest keeping littoral the same: “freshwater littoral zone”, [ENVO:01000409] Waiting for Hugh to confirm what to do for stratified rivers.

aclum commented 1 year ago

waiting for final confirmation from Hugh.

aclum commented 1 year ago

Hugh confirmed "freshwater river”, [ENVO:01000297] for env_local_scale for rivers

Proposed final set of rules, we'll need to port this to the assets csv. Asked Hugh about using multiple terms vs a single term for env_local_scale. NMDC currently doesn't support multiple env context terms.

if (DP1.20281.001 amc_fieldSuperParent aquaticSiteType lake) { then

NEON_data_product NEON_table NEON_field NEON_value NMDC_Class NMDC_slot_name NMDC_slot_value

DP1.20281.001 amc_fieldSuperParent aquaticSiteType lake Biosample env_broad_scale “freshwater lake biome” [ENVO:01000252] DP1.20281.001 amc_fieldSuperParent aquaticSiteType lake Biosample env_medium “lake water” [ENVO:04000007] DP1.20281.001 amc_fieldSuperParent namedLocation buoy.c0 Biosample env_local_scale ‘water surface’ [ENVO:01001191] DP1.20281.001 amc_fieldSuperParent namedLocation buoy.c1 Biosample env_local_scale ‘epilimnion’ [ENVO:00002131] DP1.20281.001 amc_fieldSuperParent namedLocation buoy.c2 Biosample env_local_scale ‘thermocline’ [ENVO:00002269] DP1.20281.001 amc_fieldSuperParent namedLocation buoy.c3 Biosample env_local_scale ‘hypolimnion’ [ENVO:00002130] DP1.20281.001 amc_fieldSuperParent namedLocation littoral Biosample env_local_scale “freshwater littoral zone” [ENVO:01000409]

} if (DP1.20281.001 amc_fieldSuperParent aquaticSiteType river) then {

NEON_data_product NEON_table NEON_field NEON_value NMDC_Class NMDC_slot_name NMDC_slot_value

DP1.20281.001 amc_fieldSuperParent aquaticSiteType river Biosample env_broad_scale “freshwater river biome” [ENVO:01000253] DP1.20281.001 amc_fieldSuperParent aquaticSiteType river Biosample env_local_scale "freshwater river” [ENVO:01000297] DP1.20281.001 amc_fieldSuperParent aquaticSiteType river Biosample env_medium “river water”, [ENVO:01000599] }

ssarrafan commented 1 year ago

@aclum can this be closed now that Hugh has confirmed?

sujaypatil96 commented 8 months ago

@aclum using this table to finish up the surface water ingest pipeline. I should have a JSON ready soon.

sujaypatil96 commented 8 months ago

@aclum in the aquaticSiteType column you can see three types of values — lake, river and stream. What do you assign to the MIXS ENVO triad values when aquaticSiteType == “stream”?

The above table doesn't seem to have mappings for that case?

sujaypatil96 commented 8 months ago

There may be a case that new ENVO terms would be required for the aquaticSiteType "stream".

Presumably @turbomam would be responsible for adding these terms to ENVO, so we should let him know at the earliest.

CC: @aclum

aclum commented 8 months ago

@turbomam is going to add new a new freshwater stream term to envo. https://github.com/EnvironmentOntology/envo/issues/1476

turbomam commented 8 months ago

We have an EnvO PR for this:

There's a violation of best practice in the resulting branch. I'm 99.9 % sure I didn't add it. I personally wouldn't it want it added to a repo that I manage. So I'm waiting for feedback from @cmungall or Pier.

Having said that, I don't see why the reserved ID would change, so it's probably safe to start using this:

ENVO:03605007, 'freshwater stream biome'

aclum commented 8 months ago

Chris made some comments on Mark's PR so moving to the next sprint as in review.

aclum commented 8 months ago

Mark's PR was merged in. For streams we'll use env_broad_scale “freshwater stream biome” [ENVO:03605007] env_local_scale "freshwater stream" {ENVO:03605006] env_medium "stream water" [ENVO:03605006]

cc @sujaypatil96