Open turbomam opened 1 year ago
Below you will see a partially-migrated Biosample
, which @mslarae13 identified as a soil biosample, lacking a
has_minimum_numeric_value
Is there one slot here whose value is always indicative of a soil biosample and never indicative of some other kind of biosample? I don't think it would be appropriate to blindly do a free-text search across the whole document and say that, if the string "soil" appears anywhere, then it is a soil biosample.
Let's consider {"ecosystem_type": "Soil"}
. That query returns 548 biosamples out of the 753.
But this query shows that it doesn't map to any other MIxS environmental packages.
db["biosample_set"].distinct("ecosystem_type").forEach(function(value){print("ecosystem_type" + ", " + value + ": " + db["biosample_set"].count({["ecosystem_type"]: value}))})
'DeprecationWarning: Collection.count() is deprecated. Use countDocuments or estimatedDocumentCount.' 'ecosystem_type, Chemical products: 2' 'ecosystem_type, Deep subsurface: 17' 'ecosystem_type, Freshwater: 37' 'ecosystem_type, Roots: 128' 'ecosystem_type, Sand microcosm: 17' 'ecosystem_type, Soil: 548' 'ecosystem_type, Volcanic: 4'
Biosample
{
"_id": {
"$oid": "634859eb604457c085089d23"
},
"id": "nmdc:a6c3b0d2-6f2e-4715-84d3-666719d1f54b",
"name": "Bulk soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_115",
"description": "Bulk soil microbial communities from the East River watershed near Crested Butte, Colorado, United States",
"env_broad_scale": {
"has_raw_value": "ENVO:00000108",
"term": {
"id": "ENVO:00000108"
}
},
"env_local_scale": {
"has_raw_value": "ENVO:00000292",
"term": {
"id": "ENVO:00000292"
}
},
"env_medium": {
"has_raw_value": "ENVO:00005802",
"term": {
"id": "ENVO:00005802"
}
},
"type": "nmdc:Biosample",
"collection_date": {
"has_raw_value": "2017-03-07"
},
"depth": {
"has_raw_value": "0.0",
"has_numeric_value": 0,
"has_unit": "meter",
"has_maximum_numeric_value": 0.05
},
"geo_loc_name": {
"has_raw_value": "USA: Colorado"
},
"lat_lon": {
"has_raw_value": "38.917216053 -106.9559947",
"latitude": 38.917216053,
"longitude": -106.9559947
},
"ecosystem": "Environmental",
"ecosystem_category": "Terrestrial",
"ecosystem_type": "Soil",
"ecosystem_subtype": "Meadow",
"specific_ecosystem": "Bulk soil",
"add_date": "2018-06-22",
"community": "microbial communities",
"habitat": "bulk soil",
"location": "The East River watershed near Crested Butte, Colorado, USA",
"mod_date": "2021-06-15",
"ncbi_taxonomy_name": "soil metagenome",
"sample_collection_site": "soil",
"alternative_identifiers": [
"gold:Gb0191643",
"gold:Gb0205601",
"img.taxon:3300042813"
],
"gold_sample_identifiers": [
"gold:Gb0191643"
],
"insdc_biosample_identifiers": [
"biosample:SAMN10864388"
],
"sample_link": [
"gold:Gs0135149"
],
"samp_name": "ER_115",
"igsn_biosample_identifiers": [
"igsn:IEWFS0001"
]
}
With discussion during the squad sync, I realized "has_value" is the minimum value and the depth2 is the max value.
The metadata is there, but need to be more clearly stated "has minimum value"
@turbomam I think this is related to this. From the API, I see
results__depth__has_raw_value
results__depth__has_minimum_numeric_value
results__depth__has_unit
results__depth__has_maximum_numeric_value
And we WILL be keeping these 3 depth fields, yes?
Is this issue resolved? Is depth2 gone? It looks like it is gone from the schema, but I am still seeing it in... runtime? and documentation?
depth2
is being deprecated in https://github.com/microbiomedata/nmdc-schema/blob/issue-486-data-to-7-0 (which will actually result in a new version of the schema, greater than 7.0.0)depth2
content intodepth
, should we just migrate thedepth
's has_minimum_numeric_value value into the has_minimum_numeric_value slot if necessary?