microbiomedata / nmdc-metadata

Managing metadata and policy around metadata in NMDC
https://microbiomedata.github.io/nmdc-schema/
Other
2 stars 0 forks source link

add properties to gold biosample #90

Closed wdduncan closed 4 years ago

wdduncan commented 4 years ago

Gold biosample json fails validation:

Additional properties are not allowed ('location', 'mod_date', 'identifier', 'sample_collection_site', 'add_date', 'ncbi_taxonomy_name', 'community', 'habitat', 'type' were unexpected)

Add properties to nmdc schema

cc @cmungall

cmungall commented 4 years ago

How does location differ from geographic_location (which maps to mixs:geo_loc_name)? Same Q sample_collection_site. for We should have blank rows in gold-to-mixs.sssom.tsv for these

I suggest simply going ahead and adding these to the schema. However, add a status: draft to each of these until we have established their meaning.

wdduncan commented 4 years ago

Re:

How does location differ from geographic_location (which maps to mixs:geo_loc_name)?

I asked Reddy about the difference between the location and geographic location fields. Here is the email:

Hi Reddy, When I run the query:

select distinct GEOGRAPHIC_LOCATION, LOCATION from BIOSAMPLE

I see a lot of values that are syntactically different, but are very close equivalent semantically; e.g. "USA: Ohio" vs "Ohio, USA". > Sometimes one of the values is more specific than the other.

We follow USA: Ohio convention, other variations are something we may not yet have touched or reviewed.

The online documentation (https://gold.jgi.doe.gov/resources/Standardized_Metagenome_Naming.pdf), defines "Location" as: Location, which provides information about the geographic location of the sample For purposes of mapping to MIxS, this seems to be the same as geographic location. This is defined in MIxS as:

The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (v 1.512) (http://purl.bioontology.org/ontology/GAZ)

We also have a country filed. This essentially same as INSDC country list including oceans. Our geographic location is a free text field following the above format like "USA: Ohio"

Yes, this part of the canonical naming conventions we follow for environmental samples. https://gold.jgi.doe.gov/resources/Standardized_Metagenome_Naming.pdf

wdduncan commented 4 years ago

Fields added to schema. See PR: https://github.com/microbiomedata/nmdc-metadata/pull/145